Skip to content

fix(hll): preserve set coupon order when serializing#128

Draft
notfilippo wants to merge 2 commits into
mainfrom
filippo.rossi/push-yoruxxvzvuqm
Draft

fix(hll): preserve set coupon order when serializing#128
notfilippo wants to merge 2 commits into
mainfrom
filippo.rossi/push-yoruxxvzvuqm

Conversation

@notfilippo
Copy link
Copy Markdown
Member

Compact HLL Set serialization used to sort retained coupons before writing bytes. Java and C++ write compact Set coupons in hash table iteration order.

Write the existing iterator output directly. This keeps the compact format and removes the extra allocation.

@notfilippo notfilippo added the bug Something isn't working label May 13, 2026
@tisonkun
Copy link
Copy Markdown
Member

Do we have a test to capture this issue? I can't simply understand the situation by barely reading this patch.

@notfilippo
Copy link
Copy Markdown
Member Author

Do we have a test to capture this issue? I can't simply understand the situation by barely reading this patch.

I am working on a test but you were too fast 🏎️ ! Will mark as draft and work on it :)

@notfilippo notfilippo marked this pull request as draft May 13, 2026 15:36
@tisonkun
Copy link
Copy Markdown
Member

Do we have a test to capture this issue? I can't simply understand the situation by barely reading this patch.

I am working on a test but you were too fast 🏎️ ! Will mark as draft and work on it :)

Great!

I wonder if you'd suggest to cancel 0.3.0-rc.2 for this fix, or we may later release a (quick) 0.3.1 with this fix.

@notfilippo
Copy link
Copy Markdown
Member Author

notfilippo commented May 13, 2026

It's very low priority fix which does not break compatibility. IMO we can release it in > 0.3.0.

Merging this patch would just save an allocation and make it bit-perfect with other libraries when the insertion order is identical.

Compact HLL Set serialization used to sort retained coupons before writing
bytes. Java and C++ write compact Set coupons in hash table iteration order.

Write the existing iterator output directly. This keeps the compact format
and removes the extra allocation.
@notfilippo notfilippo force-pushed the filippo.rossi/push-yoruxxvzvuqm branch from 0810c99 to 93e4152 Compare May 13, 2026 16:01
@notfilippo
Copy link
Copy Markdown
Member Author

Added the test. Not amazingly happy with it but I think it's ok. I might add some comments later. Will mark ready for review after I'm done thinking about it :)

Copy link
Copy Markdown
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified locally that the test case failed before the impl change.

Generally LGTM.

Comment thread datasketches/tests/hll_serialization_test.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants