Skip to content

NaN Handling#52

Open
jharwood0 wants to merge 11 commits into
developfrom
hotfix/ODB-569
Open

NaN Handling#52
jharwood0 wants to merge 11 commits into
developfrom
hotfix/ODB-569

Conversation

@jharwood0
Copy link
Copy Markdown

@jharwood0 jharwood0 commented Apr 22, 2026

Description

This pr identifies use cases where NaNs result in unexpected behavior:

  • Comparator treats NaN != NaN (IEEE) so two files with identical bits compare as different
  • NaNs round-trip fine through Double/Real columns, but Int/Bit cols reinterpret the NaN's 8 bytes through the integer codec (with narrowing or offset arithmetic depending on which one), and decode back as a plausible-looking integer — ultimately casting {1.0, NaN, 3.0} to an INT column reads back {1, 1, 3}
  • min/max column metadata also gets corrupted:
    • For Double/Real types — when a NaN arrives before any real values: NaN poisons the min/max slots, subsequent real values can't update them (NaN comparisons are always false)
    • For Integer/Bitfield — NaN bits get reinterpreted as a very large int, always overwriting max regardless of position; knock-on effect for the optimiser which sees a range of ~4.6e18

to fix the above:

  • Comparator::same() now checks for NaN first — two NaNs are treated equal,
    one NaN still compares false. dropped the $odc_NAN_IS_OK env var that was
    guarding the flag-controlled version

  • gatherStats early-returns on NaN, so min/max only track real values —
    fixes the Double/Real poisoning and stops Int/Bit from reinterpret-casting
    the NaN bits as a huge int

  • NaN writes to Int/Bit columns are coerced to the column's existing
    missing-value sentinel with a log warning

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

📖 Documentation 📖
https://sites.ecmwf.int/docs/dev-section/odc/pull-requests/PR-52

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.00%. Comparing base (424ce3d) to head (7fbc235).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #52      +/-   ##
===========================================
+ Coverage    86.91%   87.00%   +0.09%     
===========================================
  Files          188      189       +1     
  Lines        15022    15125     +103     
  Branches      1363     1370       +7     
===========================================
+ Hits         13056    13160     +104     
+ Misses        1966     1965       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jharwood0 jharwood0 marked this pull request as ready for review April 29, 2026 14:28
Comment thread src/odc/Comparator.h Outdated
Comment on lines 80 to 85
inline static int same(double A, double B) {
if (std::isnan(A) || std::isnan(B))
return std::isnan(A) && std::isnan(B);
return err(A, B) < maxRelativeError;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline static int same(double A, double B) {
if (std::isnan(A) || std::isnan(B))
return std::isnan(A) && std::isnan(B);
return err(A, B) < maxRelativeError;
}
inline static bool same(double A, double B) {
if (std::isnan(A) && std::isnan(B)) {
return true;
}
return err(A, B) < maxRelativeError;
}

Do:

  • Return bool instead offing that gets converted to bool
  • One isnan() check
  • Use {} around conditional blocks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I one-up this suggestion. The NaN case is rare. but nan-checking here will be on the performance hotpath. We can reduce the number of checks if we do the < test first:

inline static bool same(double A, double B) {
    if (err(A, B) < maxRelativeError) || (std::isnan(A) && std::isnan(B))) {
        return true;
    }
    return false;
}

Comment thread src/odc/core/Codec.h Outdated
Comment on lines +73 to +74
void hasNaN(bool s) { hasNaN_ = s; }
bool hasNaN() const { return hasNaN_; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the API correctly the intent is to mark the occurrence of NaNs once until resetStats is called. The API as it is implemented now allows callers to "unset" the NaN status.

Consider replacing hasNaN(bool) with
void foundNaN() { hasNaN_ = true; }

Comment thread src/odc/Comparator.h Outdated
Comment on lines 80 to 85
inline static int same(double A, double B) {
if (std::isnan(A) || std::isnan(B))
return std::isnan(A) && std::isnan(B);
return err(A, B) < maxRelativeError;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I one-up this suggestion. The NaN case is rare. but nan-checking here will be on the performance hotpath. We can reduce the number of checks if we do the < test first:

inline static bool same(double A, double B) {
    if (err(A, B) < maxRelativeError) || (std::isnan(A) && std::isnan(B))) {
        return true;
    }
    return false;
}

Comment thread src/odc/core/Codec.cc Outdated
}

void Codec::gatherStats(const double& v) {
if (std::isnan(v)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that this doesn't interfere with the gatherStats overrides in Constant/Real/Integer/String.h. Should be OK, but just check.

Comment thread src/odc/codec/IntegerMissing.h Outdated
const ValueType& val(reinterpret_cast<const ValueType&>(d));
InternalValueType s;
if (val == this->missingValue_) {
if (ODBAPISettings::instance().integersAsDoubles() && std::isnan(d)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is correct. If you try and use a NaN to represent an integer that is just an invalid/out of range value. And should trigger an error/throw an exception rather than marking it as a missing value.

Comment thread src/odc/codec/Integer.h Outdated

void gatherStats(const double& v) override {
static_assert(sizeof(ValueType) == sizeof(v), "unsafe casting check");
// n.b. NaN has no meaningful integer bit pattern. Coerce to missing and flag
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is an invalid value which should error.

Comment thread tests/core/test_codecs_end_to_end.cc Outdated
}
}

CASE("NaN is coerced to missing in INTEGER/BITFIELD columns") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, I think this is incorrect. The question is only where in the encoding chain do we throw an exception. This should be done in such a way to have a minimal overhead on the normal encoding case.

Comment thread tests/core/test_comparator.cc
Comment thread tests/core/test_codecs_end_to_end.cc Outdated
odc::Reader oda(dh);

odc::Reader::iterator it = oda.begin();
EXPECT((*it)[0] == 1.0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we have EXPECT_EQUAL(A,B) now, which prints a nicer errors than EXPECT(A==B)

Copy link
Copy Markdown
Contributor

@simondsmart simondsmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks very clean. One variable-naming comment. Otherwise merge when ready (or ping me to merge).

Comment thread src/odc/codec/Integer.h Outdated
core::DataStreamCodec<ByteOrder>(name, type), castedMissingValue_(static_cast<ValueType>(minmaxmissing)) {
core::DataStreamCodec<ByteOrder>(name, type),
castedMissingValue_(static_cast<ValueType>(minmaxmissing)),
rejectNaN_(ODBAPISettings::instance().integersAsDoubles()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realisticall this is just caching integersAsDoubles, rather than rejectNaN. Please call it that (it is possible that it could be used for something else as well...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants