Skip to content

fix: respect negative values on deserialization in REQ sketch#734

Open
proost wants to merge 6 commits into
apache:mainfrom
proost:fix-respect-negative-values
Open

fix: respect negative values on deserialization in REQ sketch#734
proost wants to merge 6 commits into
apache:mainfrom
proost:fix-respect-negative-values

Conversation

@proost
Copy link
Copy Markdown
Member

@proost proost commented Apr 25, 2026

Float.MIN_VALUE is smallest positive value. So if REQ sketch include negative values, min value is not respected after deserialization.

C++ is ok, because it delegates to comparator not like java.

After this PR merged, I will send PR for compat in C++.

@proost proost requested a review from leerho April 25, 2026 08:20
@proost proost self-assigned this Apr 25, 2026
Copy link
Copy Markdown
Member

@leerho leerho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in the code.

posSeg.getFloatArray(arr, 0, count);
float minItem = Float.MAX_VALUE;
float maxItem = Float.MIN_VALUE;
float maxItem = -Float.MAX_VALUE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you found here is obviously incorrect, but neither is your solution. Internally, for the Classic Quantiles, KLL and REQ, the min and max values are initialized to NaN. and the first incoming value that is not NaN, sets both the min and max values. This allows input values of +/- infinity, which your solution would not recognize. I would argue that you should use a similar solution here, for consistency if nothing else.

I would also suggest you create a test that uses +/- Infinities, just to confirm that it works.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d67731a

Oh, I was wrong. I changed like other sketches or Update method.

And also add Infinity cases

@proost proost requested a review from leerho May 13, 2026 15:32
Copy link
Copy Markdown
Member

@leerho leerho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See new comment inline.

} else {
if (item < minItem) { minItem = item; }
if (item > maxItem) { maxItem = item; }
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken a closer look at this, and what you have here is not correct either. If a NaN should happen to appear, it would reset both the minItem and maxItem values to NaN, erasing the tracked minItem and maxItem values.

This extractCompactor is part of the deserialization process, and since NaNs are rejected during sketch construction, they should not appear in the serialization data. Nonetheless, this section of code is just trying to reconstruct the min and max values of this specific compactor.

These specific min and max values here are only used in a sketch that is in EXACT format, which has a single compactor. The min and max values are not stored in the serialization structure itself and must be derived dynamically from the compactor. This keeps the overall storage small for small data streams.

Note that in the normal, ESTIMATION format the min and max values are stored in the serialization structure itself, so the min and max values being computed here are not used. (This means the error you observed where it looses negative values probably occurred with a very small data set.)

I think we can trust that the values being examined from the serialized data structure do not contain NaNs. This means that to correct the original error, all we need to do is correct the initialization of the min and max values to +/- infinity. So the corrected section should look like the following, which will preserve any infinities if they exist:

    posSeg.getFloatArray(arr, 0, count);
    float minItem = Float.POSITIVE_INFINITY;   // changed line
    float maxItem = Float.NEGATIVE_INFINITY; // changed line 
    for (int i = 0; i < count; i++) {
      minItem = min(minItem, arr[i]);
      maxItem = max(maxItem, arr[i]);
    }

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it to follow the code.

@proost proost requested a review from leerho May 15, 2026 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants