Skip to content

[Fix][API] Correct numeric epoch timestamp conversion#11141

Open
fallintoplace wants to merge 2 commits into
apache:devfrom
fallintoplace:fix/basic-data-converter-epoch-units
Open

[Fix][API] Correct numeric epoch timestamp conversion#11141
fallintoplace wants to merge 2 commits into
apache:devfrom
fallintoplace:fix/basic-data-converter-epoch-units

Conversation

@fallintoplace

@fallintoplace fallintoplace commented Jun 20, 2026

Copy link
Copy Markdown

Purpose of this pull request

Fix BasicDataConverter numeric DATE/TIMESTAMP conversion for modern epoch-second values when timestamp precision metadata is available.

The old shared fallback treated values like 1700000000 as epoch milliseconds, which converted a normal 2023 epoch-second timestamp into a date in 1970. This patch routes BasicTypeDefine numeric timestamp conversion through timestamp scale metadata when present:

  • scale 0 -> epoch seconds;
  • scale 3 -> epoch milliseconds;
  • scale 6 -> epoch microseconds;
  • scale 9 -> epoch nanoseconds.

For no-metadata conversion, the legacy seconds/milliseconds cutoff is preserved so early-epoch millisecond values such as 1000000000 are not reinterpreted as seconds.

Does this PR introduce any user-facing change?

Yes. Numeric DATE/TIMESTAMP values with timestamp scale metadata now use that scale to choose epoch seconds, milliseconds, microseconds, or nanoseconds. No-metadata numeric conversion keeps the previous fallback behavior.

How was this patch tested?

Added unit coverage for scale-aware numeric epoch seconds, milliseconds, microseconds, and nanoseconds in BasicDataConverter, plus regression coverage for an early-epoch millisecond value without scale metadata.

  • ./mvnw -pl seatunnel-api -Dtest=org.apache.seatunnel.api.table.converter.BasicDataConverterTest -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test
  • ./mvnw -pl seatunnel-api -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test
  • ./mvnw -pl seatunnel-api spotless:check

Check list

  • No new Jar binary package is added.
  • Documentation is not needed for this bug fix.
  • No incompatible change file update is needed.
  • No connector packaging changes are needed.

@github-actions github-actions Bot added the api label Jun 20, 2026

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I reviewed the full current diff from the real BasicDataConverter number-to-time conversion path, not just the new unit tests. The direction makes sense, but I do see one blocker before merge.

What this PR is trying to solve

  • Pain point: the current BasicDataConverter heuristics can treat modern epoch-second values like 1700000000 as milliseconds, which produces the wrong LocalDateTime / LocalDate.
  • Approach: move the logic into EpochTimeUtils.convertToInstant(...) and distinguish seconds / millis / micros / nanos by magnitude before the shared convertLocalDateTime(Number) and convertLocalDate(Number) paths use it.
  • Simple example: with this patch, 1700000000 no longer lands in 1970 just because the converter assumed it was milliseconds.

Full runtime path I reviewed

numeric field declared as DATE / TIMESTAMP / LOCAL_DATE_TIME
  -> BasicDataConverter.convert(...)
      -> convertLocalDateTime(typeDefine, Number)
         or convertLocalDate(typeDefine, Number)
          -> BasicDataConverter.convertLocalDateTime(Number) [BasicDataConverter.java:606-609]
             or convertLocalDate(Number) [BasicDataConverter.java:793-795]
              -> EpochTimeUtils.convertToInstant(Number) [EpochTimeUtils.java:33-51]
                  -> infer seconds / millis / micros / nanos by magnitude
              -> atZone(systemDefault)

Findings

1. The new magnitude heuristic fixes modern epoch-seconds, but it also regresses a valid epoch-millis range on the same shared path

  • Location: seatunnel-api/src/main/java/org/apache/seatunnel/api/table/converter/EpochTimeUtils.java:37-43
  • Why this is a problem:
    • The new rule treats every absolute value below 10_000_000_000 as epoch seconds.
    • That does fix modern second-based values such as 1700000000.
    • But it also misclassifies legitimate millisecond values near the Unix epoch. For example, 1_000_000_000 as milliseconds should be 1970-01-12T13:46:40Z, while the new code will interpret it as seconds and turn it into 2001-09-09T01:46:40Z.
    • Because BasicDataConverter.convertLocalDateTime(Number) and convertLocalDate(Number) now both route through this helper, this is not a side path; it is the shared default conversion contract.
  • Risk:
    • This is a correctness regression on a public API-level conversion path. Users can get the wrong date/time back from valid numeric input.
  • Suggested fix:
    • Option A: avoid widening the shared heuristic in BasicDataConverter unless the caller can provide explicit unit metadata.
    • Option B: if you keep heuristic inference, tighten it so you do not regress valid epoch-millis values in the early-epoch window.
  • Severity: High
  • Raised by others already: No

2. Non-blocking: please document the new shared inference contract explicitly

  • Location: seatunnel-api/src/main/java/org/apache/seatunnel/api/table/converter/EpochTimeUtils.java:23-51
  • Why this matters:
    • EpochTimeUtils now defines the default unit-inference contract for the shared number-to-date/time conversion path.
    • Right now the behavior is only encoded in thresholds and helper math, which makes future maintenance and compatibility review harder.
  • Suggested fix:
    • Add a short class/method comment documenting the inference rules, the intended unit order, and the known ambiguity limits.
  • Severity: Medium
  • Raised by others already: No

Compatibility / side effects

  • This is not an API signature break, but it is a behavior change on the default conversion path.
  • The main issue is not CPU or memory cost; it is that the new heuristic expands the millis-to-seconds misclassification window for older timestamps.

Test coverage / stability

  • The new BasicDataConverterTest covers modern seconds / millis / micros / nanos cases.
  • The missing regression case is an early-epoch millisecond value such as 1_000_000_000L, which is exactly where the current blocker still lives.
  • The added tests themselves look structurally stable to me.

Merge conclusion

Conclusion: can merge after fixes

  1. Blockers
  • Finding 1: the new shared unit heuristic still misparses a valid epoch-millis range on the main conversion path.
  1. Non-blocking suggestions
  • Finding 2: please document the new inference contract explicitly.

Overall, I agree with the goal of fixing modern epoch-second handling, but the current implementation trades one shared conversion bug for another. Happy to re-review once the heuristic no longer regresses valid epoch-millis inputs.

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I re-reviewed the latest head from the actual numeric-timestamp conversion path in BasicDataConverter, including the new shared helper and the regression tests.

What this PR fixes

  • Pain point: numeric epoch values can be interpreted with the wrong precision, and the old no-metadata path also used the current offset instead of the offset at the target instant.
  • Approach: centralize epoch conversion in EpochTimeUtils, use scale-aware conversion when BasicTypeDefine carries timestamp precision, and keep the legacy seconds-vs-millis heuristic for the no-metadata path.
  • One-line summary: this makes the timestamp conversion path more consistent without changing the fallback contract that existing callers rely on.

Runtime path I checked

typed conversion
  -> BasicDataConverter.convertByType(...)
      -> convertLocalDateTime(typeDefine, Number) / convertLocalDate(typeDefine, Number)
      -> EpochTimeUtils.convertToInstant(typeDefine, value)
      -> scale 0/3/6/9 => second / millisecond / microsecond / nanosecond

legacy no-metadata conversion
  -> convertLocalDateTime(Number) / convertLocalDate(Number)
      -> EpochTimeUtils.convertToInstant(value)
      -> keep the old shared heuristic: small values as epoch seconds, larger values as epoch millis

Findings

  • I do not see a new correctness blocker on the current head.
  • The helper keeps the legacy no-metadata contract intact while fixing the typed precision path.
  • The added unit tests cover the main typed second/millis/micros/nanos cases plus the early-epoch millisecond fallback.

Test note

  • The new UTs are deterministic: no sleeps, no external state, no port usage, and no order-sensitive assertions.
  • If you want one small follow-up later, adding a micro/nano test with a non-zero fractional remainder would make the floorDiv/floorMod path even more explicit, but I do not treat that as a blocker.

Merge conclusion

Conclusion: can merge

  1. Blocking items
  • None from this code rereview.
  1. Suggested but non-blocking follow-up
  • Optional: add one fractional micro/nano regression case.

Overall, the latest head looks solid to me. The conversion logic is cleaner, and I do not see a compatibility regression in the path I traced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants