DX-105463: [C++][Gandiva] Add TimestampIR wrapper for next_day#135
Merged
akravchukdremio merged 1 commit intodremio_27.0_20from Apr 20, 2026
Merged
Conversation
next_day(timestamp, utf8) was registered in the C++ function registry for timestamp inputs but was not included in any TimestampIR table, so no _us/_ns IR wrappers were generated. With the relaxed DataTypeEquals (ignoring TimeUnit), calls like next_day(timestamp[us], 'MO') pass validation and are routed to Gandiva, but BuildFunctionCall silently falls through to the precompiled millis function, which interprets microseconds as milliseconds — producing dates ~51,000 years in the future (e.g. +53425-02-28 for a 2021 input). This adds a BuildNextDayWrapper that scales the timestamp input to millis via FloorDiv and calls the precompiled next_day_from_timestamp(context, millis, day_str, day_len) function. Pattern follows the cast-from-timestamp wrapper shape with the additional (context, string, length) args. No remainder recombination is needed: next_day returns date64 (midnight of the next weekday), so sub-millisecond input precision is not meaningful in the result. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
d256409 to
73313a7
Compare
lriggs
approved these changes
Apr 18, 2026
lriggs
pushed a commit
to lriggs/arrow
that referenced
this pull request
Apr 23, 2026
…o#135) next_day(timestamp, utf8) was registered in the C++ function registry for timestamp inputs but was not included in any TimestampIR table, so no _us/_ns IR wrappers were generated. With the relaxed DataTypeEquals (ignoring TimeUnit), calls like next_day(timestamp[us], 'MO') pass validation and are routed to Gandiva, but BuildFunctionCall silently falls through to the precompiled millis function, which interprets microseconds as milliseconds — producing dates ~51,000 years in the future (e.g. +53425-02-28 for a 2021 input). This adds a BuildNextDayWrapper that scales the timestamp input to millis via FloorDiv and calls the precompiled next_day_from_timestamp(context, millis, day_str, day_len) function. Pattern follows the cast-from-timestamp wrapper shape with the additional (context, string, length) args. No remainder recombination is needed: next_day returns date64 (midnight of the next weekday), so sub-millisecond input precision is not meaningful in the result. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
lriggs
added a commit
that referenced
this pull request
Apr 24, 2026
…/ns] support (#137) * DX-105463: [C++][Gandiva] Add TimestampIR for unit-aware timestamp[us/ns] support Co-Authored-By: Claude Sonnet 4.6 <[email protected]> * DX-105463: Fix clang-format violations in TimestampIR, engine, generator, and test files * Fix compilation issues * Updated tests and a few tweaks for passing dremio tests. * add error message for missing timestamp function * DX-105463: [C++][Gandiva] Add TimestampIR wrapper for next_day (#135) next_day(timestamp, utf8) was registered in the C++ function registry for timestamp inputs but was not included in any TimestampIR table, so no _us/_ns IR wrappers were generated. With the relaxed DataTypeEquals (ignoring TimeUnit), calls like next_day(timestamp[us], 'MO') pass validation and are routed to Gandiva, but BuildFunctionCall silently falls through to the precompiled millis function, which interprets microseconds as milliseconds — producing dates ~51,000 years in the future (e.g. +53425-02-28 for a 2021 input). This adds a BuildNextDayWrapper that scales the timestamp input to millis via FloorDiv and calls the precompiled next_day_from_timestamp(context, millis, day_str, day_len) function. Pattern follows the cast-from-timestamp wrapper shape with the additional (context, string, length) args. No remainder recombination is needed: next_day returns date64 (midnight of the next weekday), so sub-millisecond input precision is not meaningful in the result. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Sonnet 4.6 <[email protected]> Co-authored-by: Dmitry Chirkov <[email protected]> Co-authored-by: Arkadii Kravchuk <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a silent-wrong-results correctness bug in
next_day(timestamp[us/ns], utf8)when the relaxedDataTypeEqualsroutes non-ms timestamps through Gandiva but no IR wrapper exists.Follow-up to #134 (TimestampIR for unit-aware timestamp support, now merged).
Background
During end-to-end testing of #134 against real Parquet/Iceberg data (which stores timestamps as
timestamp[us]regardless of the original SQL precision),NEXT_DAY(ts_us, 'MO')returned+53425-02-28— a date ~51,000 years in the future — instead of the correct Monday after the input date.Root cause
next_dayis registered infunction_registry_datetime.ccfor bothdate64andtimestampvia theDATE_TYPESmacro:DATE_TYPES(NEXT_DAY_SAFE_NULL_IF_NULL, next_day, {})But
next_daywas not included in any of the static tables intimestamp_ir.cc(kExtracts,kTruncs,kDiffs,kCastsFromTs,kDateArithEntries,kFixedAdds,kCalendarAdds). As a result, nonext_day_from_timestamp_us/_nsIR wrappers were generated.With the
DataTypeEqualsrelaxation from #134, the validation step matchestimestamp[us]against the registeredtimestamp[ms]signature.BuildFunctionCallthen tries to remapnext_day_from_timestamp→next_day_from_timestamp_us, fails to find it inIsTimestampIRFunction, and silently falls through to calling the precompiled millis function with raw microsecond values — yielding dates far in the future.Fix
Adds a
BuildNextDayWrapperthat follows the cast-from-timestamp pattern:No remainder recombination is needed because
next_dayreturnsdate64(midnight of the next weekday), so sub-millisecond input precision is not meaningful in the result.Files changed
cpp/src/gandiva/timestamp_ir.h— declareBuildNextDayWrappercpp/src/gandiva/timestamp_ir.cc—next_day_from_timestamp{_us,_ns}inBuildAllFunctionNamesBuildNextDayWrapper(modeled afterBuildCastFromTimestampWrapperwith extra context/string/length args)AddFunctionsfor each non-ms unitBuild artifacts
Native library built and published with this fix:
18.3.0-20260418002220-5d2c1e7-73313a7-dremioTesting
Verified end-to-end on a local Dremio instance reading Parquet timestamp data.
Test query:
where
ts_usis atimestamp[us]column containing2021-06-15 14:30:45.123456(a Tuesday).Before fix (arrow-gandiva 18.3.0-...-28932a0-dremio):
(microseconds interpreted as milliseconds → date ~51,000 years in the future)
After fix (arrow-gandiva 18.3.0-...-73313a7-dremio):
(correct Monday after 2021-06-15)
Are there user-facing changes?
No API changes.
NEXT_DAYonTIMESTAMP(6)/TIMESTAMP(9)columns now returns correct results instead of dates ~51,000 years in the future.🤖 Generated with Claude Code