Skip to content

[Fix][Transform-V2] Correct EXTRACT epoch and subsecond units#11142

Open
fallintoplace wants to merge 2 commits into
apache:devfrom
fallintoplace:fix/extract-epoch-units
Open

[Fix][Transform-V2] Correct EXTRACT epoch and subsecond units#11142
fallintoplace wants to merge 2 commits into
apache:devfrom
fallintoplace:fix/extract-epoch-units

Conversation

@fallintoplace

Copy link
Copy Markdown

Purpose of this pull request

Fix Zeta SQL EXTRACT behavior to match the documented datetime fields.

This patch:

  • returns BIGINT for EXTRACT(EPOCH ...) so timestamps after 2038 do not overflow;
  • supports both singular and plural MILLISECOND(S) and MICROSECOND(S);
  • returns the seconds field including fractional parts for millisecond and microsecond extraction.

Does this PR introduce any user-facing change?

Yes. EXTRACT(EPOCH ...) now returns BIGINT instead of INT, and EXTRACT(MILLISECOND(S)|MICROSECOND(S) ...) now includes the whole seconds field as documented.

How was this patch tested?

Added regression coverage for post-2038 EPOCH, plural/singular millisecond and microsecond fields, and output type inference for EPOCH.

  • ./mvnw -pl seatunnel-shade/seatunnel-calcite -DskipTests -DskipITs -Dcheckstyle.skip -Dspotless.check.skip install
  • ./mvnw -pl seatunnel-shade/seatunnel-janino -DskipTests -DskipITs -Dcheckstyle.skip -Dspotless.check.skip install
  • ./mvnw -pl seatunnel-transforms-v2 -Dtest=org.apache.seatunnel.transform.sql.zeta.ExtractFunctionTest -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test
  • ./mvnw -pl seatunnel-transforms-v2 -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test
  • ./mvnw -pl seatunnel-transforms-v2 spotless:check

Check list

  • No new Jar binary package is added.
  • Documentation already matches the corrected behavior.
  • No incompatible change file update is needed.
  • No connector packaging changes are needed.

@fallintoplace fallintoplace force-pushed the fix/extract-epoch-units branch from 6974978 to e553300 Compare June 20, 2026 12:58

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I reviewed the full current diff from the real Zeta SQL EXTRACT(...) path, not just the added assertions.

What this PR fixes

  • Pain point: EXTRACT(EPOCH ...) can overflow after 2038 because the result is forced through int, and EXTRACT(MILLISECOND / MICROSECOND ...) only returns the fractional part instead of the total milliseconds / microseconds within the second.
  • Approach: change the runtime extract(...) implementation to return Number, keep EPOCH as long, and compute millisecond / microsecond values as second * unit + fractional-part. At the same time, update ZetaSQLType.getExtractType(...) so EPOCH is typed as BIGINT.
  • Simple example: EXTRACT(EPOCH FROM TIMESTAMP '2040-01-01 00:00:00') now stays a correct 64-bit value instead of overflowing through int.

Full runtime path I reviewed

SQL EXTRACT(...)
  -> parser builds ExtractExpression
      -> ZetaSQLType.getExpressionType(...) [ZetaSQLType.java:166-168]
          -> getExtractType(...) [ZetaSQLType.java:322-326]
              -> EPOCH => BIGINT
              -> others => INT

execution
  -> ZetaSQLFunction.EXTRACT
      -> DateTimeFunction.extract(...) [DateTimeFunction.java:346-572]
          -> EPOCH uses toEpochSecond(...)
          -> MILLISECOND / MICROSECOND combine second + fractional nanos

Findings

  • The normal path definitely hits this change; this is not a dormant helper.
  • The type system and runtime are now aligned for EPOCH, which closes the 2038 overflow path cleanly.
  • The new MILLISECOND / MICROSECOND behavior matches the expected SQL semantics of total milliseconds / microseconds within the second.
  • I did not find a new correctness, compatibility, or test-stability blocker on the current head.

Test coverage / stability

  • ExtractFunctionTest now covers the fractional-unit semantics and an EPOCH value after 2038.
  • These tests are fully local and deterministic; I would rate them as stable.

Merge conclusion

Conclusion: can merge

  1. Blockers
  • None on the current head.
  1. Non-blocking suggestions
  • None.

Overall, this is a focused fix on a real main-path SQL behavior mismatch, and the current version looks good to merge from my side.

@davidzollo davidzollo added the First-time contributor First-time contributor label Jun 20, 2026
@davidzollo

Copy link
Copy Markdown
Contributor

Good job.

Thanks a lot for your valuable contribution to Apache SeaTunnel!

To pass the automated CI checks for this PR, could you please enable CI workflows in your forked repository first? You can refer to the check run link below as a reference for the required CI configuration:https://github.com/apache/seatunnel/pull/11142/checks?check_run_id=82484983181

If you run into any trouble enabling CI or encounter related errors during the process, feel free to leave a comment here. Contributors are always ready to offer help~

image

@davidzollo davidzollo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if CI passes.
LGTM

@DanielLeens

DanielLeens commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Thanks. I rechecked the current state here and the code head has not changed since my approval. From my side there is nothing new to re-review on the source path itself, so this stays in the approved / waiting-for-CI state. Once the CI workflow is enabled and the checks rerun, happy to take another look if a new code diff appears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants