Skip to content

[Fix][Format] Correct Debezium numeric TIME parsing#11140

Open
fallintoplace wants to merge 2 commits into
apache:devfrom
fallintoplace:fix/debezium-time-units
Open

[Fix][Format] Correct Debezium numeric TIME parsing#11140
fallintoplace wants to merge 2 commits into
apache:devfrom
fallintoplace:fix/debezium-time-units

Conversation

@fallintoplace

Copy link
Copy Markdown

Purpose of this pull request

Fix Debezium JSON numeric TIME conversion so millisecond, microsecond, and nanosecond payloads are converted to nanoseconds before calling LocalTime.ofNanoOfDay.

When Debezium schema metadata is present, the converter now uses the logical schema names (io.debezium.time.Time, io.debezium.time.MicroTime, and io.debezium.time.NanoTime) to choose the unit. Schema-less payloads keep a compatibility fallback, including numeric millisecond values such as 60000 converting to 00:01:00.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit coverage for schema-less millisecond TIME values and schema-included Debezium Time/MicroTime/NanoTime payloads.

  • ./mvnw -pl seatunnel-formats/seatunnel-format-json -am -DskipTests -DskipITs -Dcheckstyle.skip -Dspotless.check.skip install
  • ./mvnw -pl seatunnel-formats/seatunnel-format-json -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test
  • ./mvnw -pl seatunnel-formats/seatunnel-format-json spotless:check

Check list

  • No new Jar binary package is added.
  • Documentation is not needed for this bug fix.
  • No incompatible change is introduced.
  • No connector packaging changes are needed.

@davidzollo davidzollo added the First-time contributor First-time contributor label Jun 20, 2026

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I reviewed the full current diff from the real Debezium JSON deserialization path, including the dispatcher path for schema-enabled payloads.

What this PR fixes

  • Pain point: Debezium TIME values do not all use the same unit. io.debezium.time.Time is milliseconds, MicroTime is microseconds, and NanoTime is nanoseconds. The old code only inferred the unit from the numeric shape, so schema-enabled micro/nano values could be parsed into the wrong LocalTime.
  • Approach: pass the schema root down into DebeziumJsonDeserializationSchema.parsePayload(...), extract the before/after field schema names, and let DebeziumRowConverter.convertDebeziumTimeToNanos(...) map Time / MicroTime / NanoTime explicitly.
  • Simple example: a Debezium MicroTime value of 60000000 now becomes 00:01:00 because it is treated as microseconds instead of being guessed from magnitude alone.

Full runtime path I reviewed

Debezium message
  -> DebeziumJsonDeserializationSchema.deserializeMessage(...) [DebeziumJsonDeserializationSchema.java:112-121]
      -> getPayload(root)
      -> parsePayload(out, tablePath, root, payload) [DebeziumJsonDeserializationSchema.java:133-141]
          -> extractFieldSchemaNames(root, "before"/"after") [DebeziumJsonDeserializationSchema.java:224-247]
          -> debeziumRowConverter.parse(payload.get("after"), afterFieldSchemaNames)
              -> DebeziumRowConverter.getValue(..., TIME, ..., fieldSchemaNames) [DebeziumRowConverter.java:148-155]
                  -> convertDebeziumTimeToNanos(...) [DebeziumRowConverter.java:264-282]
                      -> Time => millis
                      -> MicroTime => micros
                      -> NanoTime => nanos

multi-table dispatcher path
  -> DebeziumJsonDeserializationSchemaDispatcher.deserialize(...) [DebeziumJsonDeserializationSchemaDispatcher.java:79-98]
      -> tableDeserializationMap.get(tablePath).parsePayload(out, root, payload)

Findings

  • The normal CDC path definitely hits this change; it is not a dormant helper.
  • The fix is scoped correctly to TIME handling and does not alter the existing TIMESTAMP or DATE branches.
  • The schema-enabled path now uses the real Debezium schema name instead of magnitude guessing, which closes the original bug at the right layer.
  • The schema-less fallback remains intact for existing millisecond inputs.
  • I did not find a new correctness, compatibility, or test-stability blocker on the current head.

Compatibility / side effects

  • No API/config/protocol break that I can see.
  • CPU and memory impact should be negligible: a couple of small per-message maps and constant-time unit checks.

Test coverage / stability

  • DebeziumJsonSerDeSchemaTest.java:493-549 now covers the schema-less millisecond case and the schema-enabled milli/micro/nano cases.
  • These tests are local and deterministic, with no obvious flaky pattern. I would rate them as stable.

Merge conclusion

Conclusion: can merge

  1. Blockers
  • None on the current head.
  1. Non-blocking suggestions
  • None.

Overall, this looks like the right fix on the real main path, and the current version is good to merge from my side.

@davidzollo

Copy link
Copy Markdown
Contributor

Good job.

Thanks a lot for your valuable contribution to Apache SeaTunnel!

To pass the automated CI checks for this PR, could you please enable CI workflows in your forked repository first? You can refer to the check run link below as a reference for the required CI configuration https://github.com/apache/seatunnel/pull/11140/checks?check_run_id=82482965720
If you run into any trouble enabling CI or encounter related errors during the process, feel free to leave a comment here. Contributors are always ready to offer help~

image

@DanielLeens

Copy link
Copy Markdown
Contributor

Thanks for the CI follow-up. From Daniel's side, the current head is still the same revision I already approved, and I do not have a new source-level concern to add on this version.

At this point the remaining step is just to get the fork workflow enabled so the Build job can actually run. If anything looks unusual while enabling the workflow, feel free to ping again and I can take another look.

@fallintoplace fallintoplace force-pushed the fix/debezium-time-units branch from 4d56b6e to 4029337 Compare June 25, 2026 15:24

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the CI follow-up. I re-reviewed the latest head from the Debezium numeric TIME deserialization path again, using the current tree rather than relying on the earlier approval alone.

What changed since the last Daniel pass

  • The current head only adds [Chore] Trigger CI.
  • I rechecked DebeziumJsonDeserializationSchema and DebeziumRowConverter; there is no new source delta on top of the version Daniel already approved.

Runtime path I checked

Debezium JSON payload
  -> DebeziumJsonDeserializationSchemaDispatcher
      -> DebeziumJsonDeserializationSchema
          -> DebeziumRowConverter.convert(...)
              -> numeric TIME values are normalized with the corrected unit handling

Findings

  • The latest head does not reopen the earlier TIME-unit correctness fix.
  • The conversion path still applies the dedicated numeric TIME handling Daniel approved earlier.
  • The regression coverage in DebeziumJsonSerDeSchemaTest still matches the real deserialization path.

CI

  • The current Build is queued on the latest head.
  • From Daniel's side, the remaining gate is CI completion rather than a reopened format-path issue.

Conclusion: can merge after fixes

  1. Blocking items
  • No new source-level blocker from Daniel's side on the current head.
  • Please let the latest queued Build finish cleanly.
  1. Suggested non-blocking follow-up
  • None from this round.

@DanielLeens

Copy link
Copy Markdown
Contributor

Thanks for the update. I rechecked the current head against the latest dev, and this branch is still behind upstream.

  • base branch: dev
  • head SHA: b33553be15182cd9d6c5fd24870b5a8020001302
  • compare status: diverged
  • ahead_by: 2
  • behind_by: 49
  • mergeable_state: unknown
  • CI / merge gate snapshot: failing (failing: Build [failure])

Because the PR is still behind the latest base, any CI result and merge gate signal here is mixed with upstream drift. The lowest-cost next step is to sync with the latest dev and rerun CI first. If anything still fails after the sync, I'm happy to take another look on top of the updated head.

@fallintoplace fallintoplace force-pushed the fix/debezium-time-units branch from b33553b to 08a2ae5 Compare June 30, 2026 16:18
@DanielLeens

Copy link
Copy Markdown
Contributor

Thanks for the update. I rechecked the current head against the latest dev, and the source-level blockers from my previous Daniel review still stand on this branch.

  • base branch: dev
  • head SHA: 08a2ae540606594ed602feae4c87fe0317e62a6d
  • compare status: diverged
  • ahead_by: 2
  • behind_by: 8
  • CI / merge gate snapshot: Build FAILURE

The current failing CI signal does not look actionable on top of this stale head yet, so the lowest-cost next step is to sync with the latest dev and rerun CI first. If anything still fails after the sync, I am happy to take another look on top of the updated head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants