[Feature][Connector-V2][CDC] Add configurable schema change behavior policies#11162
Open
CloverDew wants to merge 12 commits into
Open
Conversation
DanielLeens
suggested changes
Jun 22, 2026
DanielLeens
left a comment
Contributor
There was a problem hiding this comment.
Thanks for pushing this forward. I re-reviewed the latest head from the source-side schema-change resolver through the engine / Flink coordination path and into sink-side application. I found one real blocker plus one test gap that should be fixed before merge.
What this PR fixes
- User pain: Users need explicit schema-change behavior policies so CDC schema evolution can fail fast, ignore safe events, or evolve end to end in a controlled way.
- Fix approach: The PR introduces
SchemaChangeBehavior/SchemaChangePolicy, wires behavior through CDC source, engine, and Flink translation, and lets sinks validate schema-evolution support. - One-line summary: The design direction is good, but the current support check is too permissive for composite column-change events.
Runtime path I traced
CDC source parsing
-> AbstractSchemaChangeResolver batches all column changes from one DDL into one `AlterTableColumnsEvent`
Coordination path
-> Flink `SchemaOperator` validates the composite event
-> Engine `SinkFlowLifeCycle` validates the same composite event again
Policy check
-> `SchemaChangePolicy.isSupported()` returns true for `SCHEMA_CHANGE_UPDATE_COLUMNS` if the sink supports ANY one of ADD / DROP / UPDATE / RENAME / ALTER_COLUMN_COMMENT
Sink example
-> Elasticsearch only advertises `ADD_COLUMN`
-> its writer iterates sub-events and throws on non-ADD changes
Result
-> mixed DDL can pass the policy gate and only fail later inside the sink writer
Key findings
- This PR is on the normal schema-evolution path, not on a dormant edge path.
- The real risk is the composite
AlterTableColumnsEvent, not single add/drop/rename events. - The current policy check can green-light mixed DDL that the sink cannot actually apply.
Blocking items
- Composite
AlterTableColumnsEventis treated as supported when the sink supports only one sub-operation- Location:
seatunnel-api/src/main/java/org/apache/seatunnel/api/table/schema/SchemaChangePolicy.java:85-90; seatunnel-api/src/main/java/org/apache/seatunnel/api/table/schema/event/AlterTableColumnsEvent.java:31-52; seatunnel-connectors-v2/connector-cdc/connector-cdc-base/src/main/java/org/apache/seatunnel/connectors/cdc/base/schema/AbstractSchemaChangeResolver.java:110-145; seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/task/flow/SinkFlowLifeCycle.java:305-310; seatunnel-connectors-v2/connector-elasticsearch/src/main/java/org/apache/seatunnel/connectors/seatunnel/elasticsearch/sink/ElasticsearchSink.java:110-112; seatunnel-connectors-v2/connector-elasticsearch/src/main/java/org/apache/seatunnel/connectors/seatunnel/elasticsearch/sink/ElasticsearchSinkWriter.java:135-180 - Why this is a problem: The source batches multiple column changes from one DDL into a single
AlterTableColumnsEvent, but the policy currently treats that composite event as supported if the sink supports any one column-change capability. - Risk: Mixed DDL can pass the policy gate in
EVOLVEmode and then fail later inside the sink writer. - Suggested fix: Validate composite events against the full set of sub-events, and only pass when the sink supports every sub-operation in the batch.
- Location:
- No regression test covers mixed composite column-change events against a partially supported sink
- Location:
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/task/SeaTunnelSourceCollectorSchemaChangePolicyTest.java:49-91; seatunnel-translation/seatunnel-translation-flink/seatunnel-translation-flink-common/src/test/java/org/apache/seatunnel/translation/flink/schema/SchemaOperatorTest.java:183-237 - Why this is a problem: Current tests cover strict/ignore and single-event behavior, but not the mixed composite-event case that exposes the policy gap.
- Risk: Without a regression test, this capability mismatch can easily reappear later.
- Suggested fix: Add at least one engine or Flink-level regression test where a partially supported sink receives a mixed
AlterTableColumnsEventand is rejected at the policy gate.
- Location:
Other review notes
- I like the overall architecture direction here. Once the composite-event capability check is tightened, this will be much easier to reason about end to end.
Compatibility and side effects
- Compatibility: The new config is backward compatible, but the capability check is currently too permissive for mixed DDL.
- Side effects: This is an architecture-level behavior risk rather than a simple performance issue: policy, coordination, and sink application can disagree on what “supported” means.
- Error handling / logging: Right now the failure is deferred to sink-application time instead of being rejected clearly at the policy gate.
- Tests: The new tests cover single-event behavior, but I did not find coverage for mixed composite column-change events against a partially supported sink.
- Docs: English and Chinese schema-evolution docs were updated and generally align with the code direction.
- CI: Some checks were still pending. The blocker above is source-level and reproducible from the current code path.
Merge verdict
- I like the overall direction, but the current
SCHEMA_CHANGE_UPDATE_COLUMNSsupport check is too permissive and can green-light mixed DDL that the sink cannot actually apply.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose of this pull request
Close: #11043
Since the current branch is based on #11025, please wait for that PR to be merged; I will perform a rebase after it is merged to maintain a clean commit history and avoid conflicts.
Please begin the review starting from commit f80539d.
This PR introduces a new schema-changes.behavior option with three explicit, well-defined policies:
Does this PR introduce any user-facing change?
Yes.
A new CDC source option schema-changes.behavior is introduced. It accepts STRICT, EVOLVE, or IGNORE.
Backward compatibility is preserved:
Example configuration:
How was this patch tested?
Unit tests added or updated: SchemaChangePolicyTest, SeaTunnelSourceCollectorSchemaChangePolicyTest, SinkFlowLifeCycleSchemaChangePolicyTest, SchemaOperatorTest, FlinkSinkWriterTest
E2E tests added:
MysqlCDCWithSchemaChangeIT.testMysqlCDCWithSchemaChangeIgnore, MysqlCDCWithSchemaChangeIT.testMysqlCDCWithSchemaChangeStrict.
MysqlCDCWithFlinkSchemaChangeIT.testMysqlCDCWithFlinkSchemaChangeIgnore, MysqlCDCWithFlinkSchemaChangeIT.testMysqlCDCWithFlinkSchemaChangeStrict
conf files added:
Check list
New License Guide
incompatible-changes.mdto describe the incompatibility caused by this PR.