NaN Handling#52
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #52 +/- ##
===========================================
+ Coverage 86.91% 87.00% +0.09%
===========================================
Files 188 189 +1
Lines 15022 15125 +103
Branches 1363 1370 +7
===========================================
+ Hits 13056 13160 +104
+ Misses 1966 1965 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
12c4378 to
104bace
Compare
| inline static int same(double A, double B) { | ||
| if (std::isnan(A) || std::isnan(B)) | ||
| return std::isnan(A) && std::isnan(B); | ||
| return err(A, B) < maxRelativeError; | ||
| } | ||
|
|
There was a problem hiding this comment.
| inline static int same(double A, double B) { | |
| if (std::isnan(A) || std::isnan(B)) | |
| return std::isnan(A) && std::isnan(B); | |
| return err(A, B) < maxRelativeError; | |
| } | |
| inline static bool same(double A, double B) { | |
| if (std::isnan(A) && std::isnan(B)) { | |
| return true; | |
| } | |
| return err(A, B) < maxRelativeError; | |
| } |
Do:
- Return bool instead offing that gets converted to bool
- One isnan() check
- Use {} around conditional blocks
There was a problem hiding this comment.
Can I one-up this suggestion. The NaN case is rare. but nan-checking here will be on the performance hotpath. We can reduce the number of checks if we do the < test first:
inline static bool same(double A, double B) {
if (err(A, B) < maxRelativeError) || (std::isnan(A) && std::isnan(B))) {
return true;
}
return false;
}
| void hasNaN(bool s) { hasNaN_ = s; } | ||
| bool hasNaN() const { return hasNaN_; } |
There was a problem hiding this comment.
If I understand the API correctly the intent is to mark the occurrence of NaNs once until resetStats is called. The API as it is implemented now allows callers to "unset" the NaN status.
Consider replacing hasNaN(bool) with
void foundNaN() { hasNaN_ = true; }
| inline static int same(double A, double B) { | ||
| if (std::isnan(A) || std::isnan(B)) | ||
| return std::isnan(A) && std::isnan(B); | ||
| return err(A, B) < maxRelativeError; | ||
| } | ||
|
|
There was a problem hiding this comment.
Can I one-up this suggestion. The NaN case is rare. but nan-checking here will be on the performance hotpath. We can reduce the number of checks if we do the < test first:
inline static bool same(double A, double B) {
if (err(A, B) < maxRelativeError) || (std::isnan(A) && std::isnan(B))) {
return true;
}
return false;
}
| } | ||
|
|
||
| void Codec::gatherStats(const double& v) { | ||
| if (std::isnan(v)) { |
There was a problem hiding this comment.
Please make sure that this doesn't interfere with the gatherStats overrides in Constant/Real/Integer/String.h. Should be OK, but just check.
| const ValueType& val(reinterpret_cast<const ValueType&>(d)); | ||
| InternalValueType s; | ||
| if (val == this->missingValue_) { | ||
| if (ODBAPISettings::instance().integersAsDoubles() && std::isnan(d)) { |
There was a problem hiding this comment.
I don't think that this is correct. If you try and use a NaN to represent an integer that is just an invalid/out of range value. And should trigger an error/throw an exception rather than marking it as a missing value.
|
|
||
| void gatherStats(const double& v) override { | ||
| static_assert(sizeof(ValueType) == sizeof(v), "unsafe casting check"); | ||
| // n.b. NaN has no meaningful integer bit pattern. Coerce to missing and flag |
There was a problem hiding this comment.
No, this is an invalid value which should error.
| } | ||
| } | ||
|
|
||
| CASE("NaN is coerced to missing in INTEGER/BITFIELD columns") { |
There was a problem hiding this comment.
As mentioned, I think this is incorrect. The question is only where in the encoding chain do we throw an exception. This should be done in such a way to have a minimal overhead on the normal encoding case.
| odc::Reader oda(dh); | ||
|
|
||
| odc::Reader::iterator it = oda.begin(); | ||
| EXPECT((*it)[0] == 1.0); |
There was a problem hiding this comment.
BTW we have EXPECT_EQUAL(A,B) now, which prints a nicer errors than EXPECT(A==B)
simondsmart
left a comment
There was a problem hiding this comment.
This now looks very clean. One variable-naming comment. Otherwise merge when ready (or ping me to merge).
| core::DataStreamCodec<ByteOrder>(name, type), castedMissingValue_(static_cast<ValueType>(minmaxmissing)) { | ||
| core::DataStreamCodec<ByteOrder>(name, type), | ||
| castedMissingValue_(static_cast<ValueType>(minmaxmissing)), | ||
| rejectNaN_(ODBAPISettings::instance().integersAsDoubles()) { |
There was a problem hiding this comment.
Realisticall this is just caching integersAsDoubles, rather than rejectNaN. Please call it that (it is possible that it could be used for something else as well...)
Description
This pr identifies use cases where NaNs result in unexpected behavior:
to fix the above:
Comparator::same()now checks for NaN first — two NaNs are treated equal,one NaN still compares false. dropped the
$odc_NAN_IS_OKenv var that wasguarding the flag-controlled version
gatherStatsearly-returns on NaN, so min/max only track real values —fixes the Double/Real poisoning and stops Int/Bit from reinterpret-casting
the NaN bits as a huge int
NaN writes to Int/Bit columns are coerced to the column's existing
missing-value sentinel with a log warning
Contributor Declaration
By opening this pull request, I affirm the following:
📖 Documentation 📖
https://sites.ecmwf.int/docs/dev-section/odc/pull-requests/PR-52