[Fix][API] Correct numeric epoch timestamp conversion#11141
[Fix][API] Correct numeric epoch timestamp conversion#11141fallintoplace wants to merge 2 commits into
Conversation
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for working on this. I reviewed the full current diff from the real BasicDataConverter number-to-time conversion path, not just the new unit tests. The direction makes sense, but I do see one blocker before merge.
What this PR is trying to solve
- Pain point: the current
BasicDataConverterheuristics can treat modern epoch-second values like1700000000as milliseconds, which produces the wrongLocalDateTime/LocalDate. - Approach: move the logic into
EpochTimeUtils.convertToInstant(...)and distinguish seconds / millis / micros / nanos by magnitude before the sharedconvertLocalDateTime(Number)andconvertLocalDate(Number)paths use it. - Simple example: with this patch,
1700000000no longer lands in 1970 just because the converter assumed it was milliseconds.
Full runtime path I reviewed
numeric field declared as DATE / TIMESTAMP / LOCAL_DATE_TIME
-> BasicDataConverter.convert(...)
-> convertLocalDateTime(typeDefine, Number)
or convertLocalDate(typeDefine, Number)
-> BasicDataConverter.convertLocalDateTime(Number) [BasicDataConverter.java:606-609]
or convertLocalDate(Number) [BasicDataConverter.java:793-795]
-> EpochTimeUtils.convertToInstant(Number) [EpochTimeUtils.java:33-51]
-> infer seconds / millis / micros / nanos by magnitude
-> atZone(systemDefault)
Findings
1. The new magnitude heuristic fixes modern epoch-seconds, but it also regresses a valid epoch-millis range on the same shared path
- Location:
seatunnel-api/src/main/java/org/apache/seatunnel/api/table/converter/EpochTimeUtils.java:37-43 - Why this is a problem:
- The new rule treats every absolute value below
10_000_000_000as epoch seconds. - That does fix modern second-based values such as
1700000000. - But it also misclassifies legitimate millisecond values near the Unix epoch. For example,
1_000_000_000as milliseconds should be1970-01-12T13:46:40Z, while the new code will interpret it as seconds and turn it into2001-09-09T01:46:40Z. - Because
BasicDataConverter.convertLocalDateTime(Number)andconvertLocalDate(Number)now both route through this helper, this is not a side path; it is the shared default conversion contract.
- The new rule treats every absolute value below
- Risk:
- This is a correctness regression on a public API-level conversion path. Users can get the wrong date/time back from valid numeric input.
- Suggested fix:
- Option A: avoid widening the shared heuristic in
BasicDataConverterunless the caller can provide explicit unit metadata. - Option B: if you keep heuristic inference, tighten it so you do not regress valid epoch-millis values in the early-epoch window.
- Option A: avoid widening the shared heuristic in
- Severity: High
- Raised by others already: No
2. Non-blocking: please document the new shared inference contract explicitly
- Location:
seatunnel-api/src/main/java/org/apache/seatunnel/api/table/converter/EpochTimeUtils.java:23-51 - Why this matters:
EpochTimeUtilsnow defines the default unit-inference contract for the shared number-to-date/time conversion path.- Right now the behavior is only encoded in thresholds and helper math, which makes future maintenance and compatibility review harder.
- Suggested fix:
- Add a short class/method comment documenting the inference rules, the intended unit order, and the known ambiguity limits.
- Severity: Medium
- Raised by others already: No
Compatibility / side effects
- This is not an API signature break, but it is a behavior change on the default conversion path.
- The main issue is not CPU or memory cost; it is that the new heuristic expands the millis-to-seconds misclassification window for older timestamps.
Test coverage / stability
- The new
BasicDataConverterTestcovers modern seconds / millis / micros / nanos cases. - The missing regression case is an early-epoch millisecond value such as
1_000_000_000L, which is exactly where the current blocker still lives. - The added tests themselves look structurally stable to me.
Merge conclusion
Conclusion: can merge after fixes
- Blockers
- Finding 1: the new shared unit heuristic still misparses a valid epoch-millis range on the main conversion path.
- Non-blocking suggestions
- Finding 2: please document the new inference contract explicitly.
Overall, I agree with the goal of fixing modern epoch-second handling, but the current implementation trades one shared conversion bug for another. Happy to re-review once the heuristic no longer regresses valid epoch-millis inputs.
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update. I re-reviewed the latest head from the actual numeric-timestamp conversion path in BasicDataConverter, including the new shared helper and the regression tests.
What this PR fixes
- Pain point: numeric epoch values can be interpreted with the wrong precision, and the old no-metadata path also used the current offset instead of the offset at the target instant.
- Approach: centralize epoch conversion in
EpochTimeUtils, use scale-aware conversion whenBasicTypeDefinecarries timestamp precision, and keep the legacy seconds-vs-millis heuristic for the no-metadata path. - One-line summary: this makes the timestamp conversion path more consistent without changing the fallback contract that existing callers rely on.
Runtime path I checked
typed conversion
-> BasicDataConverter.convertByType(...)
-> convertLocalDateTime(typeDefine, Number) / convertLocalDate(typeDefine, Number)
-> EpochTimeUtils.convertToInstant(typeDefine, value)
-> scale 0/3/6/9 => second / millisecond / microsecond / nanosecond
legacy no-metadata conversion
-> convertLocalDateTime(Number) / convertLocalDate(Number)
-> EpochTimeUtils.convertToInstant(value)
-> keep the old shared heuristic: small values as epoch seconds, larger values as epoch millis
Findings
- I do not see a new correctness blocker on the current head.
- The helper keeps the legacy no-metadata contract intact while fixing the typed precision path.
- The added unit tests cover the main typed second/millis/micros/nanos cases plus the early-epoch millisecond fallback.
Test note
- The new UTs are deterministic: no sleeps, no external state, no port usage, and no order-sensitive assertions.
- If you want one small follow-up later, adding a micro/nano test with a non-zero fractional remainder would make the
floorDiv/floorModpath even more explicit, but I do not treat that as a blocker.
Merge conclusion
Conclusion: can merge
- Blocking items
- None from this code rereview.
- Suggested but non-blocking follow-up
- Optional: add one fractional micro/nano regression case.
Overall, the latest head looks solid to me. The conversion logic is cleaner, and I do not see a compatibility regression in the path I traced.
Purpose of this pull request
Fix
BasicDataConverternumeric DATE/TIMESTAMP conversion for modern epoch-second values when timestamp precision metadata is available.The old shared fallback treated values like
1700000000as epoch milliseconds, which converted a normal 2023 epoch-second timestamp into a date in 1970. This patch routesBasicTypeDefinenumeric timestamp conversion through timestamp scale metadata when present:0-> epoch seconds;3-> epoch milliseconds;6-> epoch microseconds;9-> epoch nanoseconds.For no-metadata conversion, the legacy seconds/milliseconds cutoff is preserved so early-epoch millisecond values such as
1000000000are not reinterpreted as seconds.Does this PR introduce any user-facing change?
Yes. Numeric DATE/TIMESTAMP values with timestamp scale metadata now use that scale to choose epoch seconds, milliseconds, microseconds, or nanoseconds. No-metadata numeric conversion keeps the previous fallback behavior.
How was this patch tested?
Added unit coverage for scale-aware numeric epoch seconds, milliseconds, microseconds, and nanoseconds in
BasicDataConverter, plus regression coverage for an early-epoch millisecond value without scale metadata../mvnw -pl seatunnel-api -Dtest=org.apache.seatunnel.api.table.converter.BasicDataConverterTest -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test./mvnw -pl seatunnel-api -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test./mvnw -pl seatunnel-api spotless:checkCheck list