[Fix][Format] Correct Debezium numeric TIME parsing#11140
[Fix][Format] Correct Debezium numeric TIME parsing#11140fallintoplace wants to merge 2 commits into
Conversation
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for working on this. I reviewed the full current diff from the real Debezium JSON deserialization path, including the dispatcher path for schema-enabled payloads.
What this PR fixes
- Pain point: Debezium
TIMEvalues do not all use the same unit.io.debezium.time.Timeis milliseconds,MicroTimeis microseconds, andNanoTimeis nanoseconds. The old code only inferred the unit from the numeric shape, so schema-enabled micro/nano values could be parsed into the wrongLocalTime. - Approach: pass the schema root down into
DebeziumJsonDeserializationSchema.parsePayload(...), extract the before/after field schema names, and letDebeziumRowConverter.convertDebeziumTimeToNanos(...)mapTime/MicroTime/NanoTimeexplicitly. - Simple example: a Debezium
MicroTimevalue of60000000now becomes00:01:00because it is treated as microseconds instead of being guessed from magnitude alone.
Full runtime path I reviewed
Debezium message
-> DebeziumJsonDeserializationSchema.deserializeMessage(...) [DebeziumJsonDeserializationSchema.java:112-121]
-> getPayload(root)
-> parsePayload(out, tablePath, root, payload) [DebeziumJsonDeserializationSchema.java:133-141]
-> extractFieldSchemaNames(root, "before"/"after") [DebeziumJsonDeserializationSchema.java:224-247]
-> debeziumRowConverter.parse(payload.get("after"), afterFieldSchemaNames)
-> DebeziumRowConverter.getValue(..., TIME, ..., fieldSchemaNames) [DebeziumRowConverter.java:148-155]
-> convertDebeziumTimeToNanos(...) [DebeziumRowConverter.java:264-282]
-> Time => millis
-> MicroTime => micros
-> NanoTime => nanos
multi-table dispatcher path
-> DebeziumJsonDeserializationSchemaDispatcher.deserialize(...) [DebeziumJsonDeserializationSchemaDispatcher.java:79-98]
-> tableDeserializationMap.get(tablePath).parsePayload(out, root, payload)
Findings
- The normal CDC path definitely hits this change; it is not a dormant helper.
- The fix is scoped correctly to
TIMEhandling and does not alter the existingTIMESTAMPorDATEbranches. - The schema-enabled path now uses the real Debezium schema name instead of magnitude guessing, which closes the original bug at the right layer.
- The schema-less fallback remains intact for existing millisecond inputs.
- I did not find a new correctness, compatibility, or test-stability blocker on the current head.
Compatibility / side effects
- No API/config/protocol break that I can see.
- CPU and memory impact should be negligible: a couple of small per-message maps and constant-time unit checks.
Test coverage / stability
DebeziumJsonSerDeSchemaTest.java:493-549now covers the schema-less millisecond case and the schema-enabled milli/micro/nano cases.- These tests are local and deterministic, with no obvious flaky pattern. I would rate them as stable.
Merge conclusion
Conclusion: can merge
- Blockers
- None on the current head.
- Non-blocking suggestions
- None.
Overall, this looks like the right fix on the real main path, and the current version is good to merge from my side.
|
Good job. Thanks a lot for your valuable contribution to Apache SeaTunnel! To pass the automated CI checks for this PR, could you please enable CI workflows in your forked repository first? You can refer to the check run link below as a reference for the required CI configuration https://github.com/apache/seatunnel/pull/11140/checks?check_run_id=82482965720
|
|
Thanks for the CI follow-up. From Daniel's side, the current head is still the same revision I already approved, and I do not have a new source-level concern to add on this version. At this point the remaining step is just to get the fork workflow enabled so the |
4d56b6e to
4029337
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the CI follow-up. I re-reviewed the latest head from the Debezium numeric TIME deserialization path again, using the current tree rather than relying on the earlier approval alone.
What changed since the last Daniel pass
- The current head only adds
[Chore] Trigger CI. - I rechecked
DebeziumJsonDeserializationSchemaandDebeziumRowConverter; there is no new source delta on top of the version Daniel already approved.
Runtime path I checked
Debezium JSON payload
-> DebeziumJsonDeserializationSchemaDispatcher
-> DebeziumJsonDeserializationSchema
-> DebeziumRowConverter.convert(...)
-> numeric TIME values are normalized with the corrected unit handling
Findings
- The latest head does not reopen the earlier TIME-unit correctness fix.
- The conversion path still applies the dedicated numeric TIME handling Daniel approved earlier.
- The regression coverage in
DebeziumJsonSerDeSchemaTeststill matches the real deserialization path.
CI
- The current
Buildis queued on the latest head. - From Daniel's side, the remaining gate is CI completion rather than a reopened format-path issue.
Conclusion: can merge after fixes
- Blocking items
- No new source-level blocker from Daniel's side on the current head.
- Please let the latest queued
Buildfinish cleanly.
- Suggested non-blocking follow-up
- None from this round.
|
Thanks for the update. I rechecked the current head against the latest
Because the PR is still behind the latest base, any CI result and merge gate signal here is mixed with upstream drift. The lowest-cost next step is to sync with the latest |
b33553b to
08a2ae5
Compare
|
Thanks for the update. I rechecked the current head against the latest
The current failing CI signal does not look actionable on top of this stale head yet, so the lowest-cost next step is to sync with the latest |

Purpose of this pull request
Fix Debezium JSON numeric TIME conversion so millisecond, microsecond, and nanosecond payloads are converted to nanoseconds before calling
LocalTime.ofNanoOfDay.When Debezium schema metadata is present, the converter now uses the logical schema names (
io.debezium.time.Time,io.debezium.time.MicroTime, andio.debezium.time.NanoTime) to choose the unit. Schema-less payloads keep a compatibility fallback, including numeric millisecond values such as60000converting to00:01:00.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added unit coverage for schema-less millisecond TIME values and schema-included Debezium Time/MicroTime/NanoTime payloads.
./mvnw -pl seatunnel-formats/seatunnel-format-json -am -DskipTests -DskipITs -Dcheckstyle.skip -Dspotless.check.skip install./mvnw -pl seatunnel-formats/seatunnel-format-json -DskipITs -Dcheckstyle.skip -Dspotless.check.skip test./mvnw -pl seatunnel-formats/seatunnel-format-json spotless:checkCheck list