Skip to content

[Improve][Transform-V2] Migrate transforms validation to declarative OptionRule#11095

Open
nzw921rx wants to merge 21 commits into
apache:devfrom
nzw921rx:optionrule/transform-v2
Open

[Improve][Transform-V2] Migrate transforms validation to declarative OptionRule#11095
nzw921rx wants to merge 21 commits into
apache:devfrom
nzw921rx:optionrule/transform-v2

Conversation

@nzw921rx

@nzw921rx nzw921rx commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Purpose of this pull request

Related #11007

  1. Migrate 16 Transform plugins imperative validation to declarative OptionRule + Conditions
  2. Add factory-level unit tests covering positive and negative paths for each transform

Transforms migrated:

  • CopyField
  • DataValidator
  • DefineSinkType
  • DynamicCompile
  • FieldEncrypt
  • FieldMapper
  • FilterField
  • JsonPath
  • Metadata
  • RegexExtract
  • Replace
  • RowKindExtractor
  • Split
  • SQL
  • TableFilter
  • TableMerge

Does this PR introduce any user-facing change?

1. Validation errors behavior by transform:

All transforms now use declarative OptionRule for submission-time validation. Error reporting behavior depends on the validation type:

Error behavior Transforms
Aggregated (all errors collected and returned together) Copy, TableFilter, FieldRename, TableRename, Filter, FilterRowKind, FieldMapper, FieldEncrypt, Replace, Split, Metadata, RowKindExtractor, Sql, DynamicCompile, TableMerge, LLM, Embedding
Fail-fast (stops on first error with per-entry detail, e.g. columns[2]: 'type' must not be null) DefineSinkType, DataValidator, JsonPath, RegexExtract

2. Tightened validation (configs that previously passed may now be rejected at submission time):

  • DefineSinkType: rejects duplicate column names
  • FieldEncrypt: requires max_field_length > 0
  • FieldMapper, Metadata, JsonPath, Sql, DynamicCompile: core options now enforced as required

3. Transform option rules are now fully declared and can be exported via REST API and CLI metadata.

How was this patch tested?

./mvnw test -pl seatunnel-transforms-v2

nzw921rx added 18 commits June 13, 2026 18:39
…nsformFactoryTest

Restore the accidentally deleted testFactoryIdentifierAndOptionRule() and
testCreateTransformReturnsMultiCatalogTransform() tests. The new OptionRule
validation tests are appended alongside them.
…tion

Update FilterFieldTransformTest and ReplaceTransformTest to validate
through Factory optionRule instead of expecting constructor-level
validation that was migrated to declarative OptionRule.

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I reviewed the current diff against dev and checked the declarative validation path across the updated transform factories and the added factory-level tests.

What this PR solves

  • User pain: many transform config errors are discovered too late or with weak messages.
  • Fix approach: move more validation into declarative OptionRule checks and add focused factory tests around those contracts.
  • One-line summary: I did not find a new blocking regression in the current validation path.

Runtime path rechecked

job config
-> transform factory optionRule()
-> central config validator
-> transform instance creation only after contract checks pass

I spot-checked the updated factories that carry the highest contract risk here (for example DefineSinkType, Copy, JsonPath, RegexExtract, and TableFilter) and the current direction looks coherent: the new rules fail earlier and the added tests cover the new validator behavior directly.

CI

The PR-level Build check is red right now, so I would still wait for the workflow result before merge.

Conclusion: can merge after CI is green

  1. Blocking items
  • No new source-level blocker from me on the current head.
  • Please keep an eye on the current Build failure before merge.
  1. Suggested follow-up
  • If the CI failure turns out to come from one of the newly tightened validation contracts, I would mirror that exact case into a focused factory test rather than only relying on broader integration coverage.

Thanks for pushing the validation cleanup forward.

@nzw921rx nzw921rx self-assigned this Jun 16, 2026
@zhangshenghang

zhangshenghang commented Jun 16, 2026

Copy link
Copy Markdown
Member

Thanks for the cleanup — moving the scattered imperative checks to declarative OptionRule + Conditions is a nice direction, and the per-transform factory tests are a solid addition. I reviewed the diff against dev and found one real regression plus a few things worth addressing before merge.

DynamicCompile lost its conditional-required validation

In DynamicCompileTransformFactory:

// dev: source_code is REQUIRED when compile_pattern = SOURCE_CODE
.conditional(COMPILE_PATTERN, SOURCE_CODE, DynamicCompileTransformConfig.SOURCE_CODE)

// this PR: only a not-blank value constraint, no longer required
.conditional(COMPILE_PATTERN, SOURCE_CODE, Conditions.notBlank(SOURCE_CODE))

These two overloads are not equivalent. The old conditional(opt, value, Option) registers source_code as a conditionally-required option. The new conditional(opt, value, Condition) only adds a value constraint, and ConfigValidator.isConstraintApplicable() skips a constraint whose option key is absent. So:

config dev this PR
source_code = " " (blank) error error ✅
source_code key entirely missing error silently passes

Since compile_pattern defaults to SOURCE_CODE, this is a common misconfig path. The job now passes submission-time validation and fails much later with an NPE / compile error — exactly what this refactor is supposed to prevent. absolute_path has the same issue.

I verified this locally with a probe test (compile_pattern=SOURCE_CODE, source_code omitted) — assertDoesNotThrow passes on this branch. The existing testSourceCodeBlankFails only covers the blank case, not the missing case, which is why CI is green.

Suggested fix — keep it required and add the not-blank check:

.conditional(COMPILE_PATTERN, CompilePattern.SOURCE_CODE, SOURCE_CODE)
.conditional(COMPILE_PATTERN, CompilePattern.SOURCE_CODE, Conditions.notBlank(SOURCE_CODE))
.conditional(COMPILE_PATTERN, CompilePattern.ABSOLUTE_PATH, ABSOLUTE_PATH)
.conditional(COMPILE_PATTERN, CompilePattern.ABSOLUTE_PATH, Conditions.notBlank(ABSOLUTE_PATH))

and please add negative tests for the missing-key case (and for "neither source_code nor absolute_path set").

The "errors are aggregated" claim doesn't hold for extension validators

The PR description says validation errors are now aggregated instead of failing on the first error. But the ConditionExtension implementations used here (DefineSinkType, JsonPath, DataValidator, RegexExtract, TableFilter, CopyField) throw new OptionValidationException(...) on failure, which propagates straight out of collectErrors() and aborts aggregation. So plain Conditions aggregate, but the moment an extension throws it degrades to fail-fast — inconsistent within the same transform. Either return false from the extensions (compose the message from description()) to stay aggregatable, or adjust the PR description to say structural extension checks are fail-fast.

New rules / tightening beyond a pure migration — please document for backward-compat

A few changes go beyond equivalent migration and can reject configs that worked before:

  • DefineSinkType: new duplicate-column-name check (dev only checked column/type non-null).
  • FieldEncrypt: new max_field_length > 0 (dev didn't validate it at all).
  • optional → required tightening: FieldMapper, Metadata, JsonPath, SQL core options, and DynamicCompile's compile_language (no default) are now mandatory.

These are mostly reasonable "fail earlier", but per the project's backward-compatibility policy they should be called out in the PR description and recorded in docs/en/introduction/concepts/incompatible-changes.md with migration notes.

Minor

  • TableFilterTransformFactory uses a fully-qualified java.util.regex.Pattern.compile(...) instead of a top-of-file import — inconsistent with the project style. The near-identical RegexValidator / column-structure validator inner classes across factories could also be pulled into a shared util.
  • FYI (not a problem): removing the "at least one pattern" check in TableFilterConfig.of() is fine — PATTERN_MODE has a default (INCLUDE), so patternMode != null was always true and that checkArgument was effectively dead code. No behavior change.
  • RowKindExtractorTransformFactory adding .optional(TRANSFORM_TYPE) is a genuine fix — that key was usable by the transform but undeclared, so it would have been rejected by unknown-key validation. 👍

Overall: good direction and good test effort — just please fix the DynamicCompile conditional-required regression (with missing-key tests) and document the tightened/added rules before merge.

@DanielLeens

Copy link
Copy Markdown
Contributor

Thanks for adding this detailed follow-up. I checked the DynamicCompileTransformFactory point you called out, and I agree the missing-key path is still a real validation regression on the current head, not just a test gap.

+1 to @zhangshenghang's point here: the contributor should address that conditional-required gap before this PR moves forward. Since there is no new commit yet, I am not starting another full review round on the unchanged head right now, but I am happy to re-review the latest code once an update is pushed.

@nzw921rx nzw921rx requested a review from zhangshenghang June 18, 2026 04:41

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I re-reviewed the latest head from scratch.

+1 to @zhangshenghang's earlier DynamicCompile point: the conditional-required regression is fixed on the current head by DynamicCompileTransformFactory.java:42-57, and the new negative tests at DynamicCompileTransformFactoryTest.java:83-104 now cover the missing-key paths that were previously slipping through.

I still see one blocker before merge, plus one follow-up that is worth tightening up.

What this PR solves

  • Pain point: several transform plugins were still discovering config errors too late, with validation split across constructors, runtime logic, and ad hoc checks.
  • Approach: move more of that validation into declarative OptionRule rules and add focused factory-level tests around the migrated contracts.
  • One-line summary: the direction is good, but the current head still leaves a compatibility-documentation gap, and the new "ConditionExtension aggregation" cleanup is only partial.

Full runtime path I rechecked

job config
  -> transform factory optionRule()
  -> ConfigValidator.validate(optionRule)
      -> required / optional / conditional checks
      -> Conditions.notBlank / notEmpty / greaterThan
      -> Conditions.extension(..., ConditionExtension)

if submission-time validation passes
  -> createTransform(context)
      -> transform instance is created and enters the runtime path

current head specifics
  -> DynamicCompileTransformFactory [DynamicCompileTransformFactory.java:38-60]
      -> SOURCE_CODE / ABSOLUTE_PATH are again conditionally required
  -> CopyFieldTransformFactory / TableFilterTransformFactory
      -> now return false from the extension validator, which allows aggregation to continue
  -> DefineSinkType / JsonPath / RegexExtract / DataValidator
      -> still throw OptionValidationException directly from ConditionExtension

Findings

Issue 1: the newly tightened validation contracts are still undocumented as incompatible changes

  • Location:
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/adaptsink/DefineSinkTypeTransformFactory.java:46-55,79-96
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/FieldEncryptTransformFactory.java:40-55
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/dynamiccompile/DynamicCompileTransformFactory.java:38-60
    • docs/en/introduction/concepts/incompatible-changes.md:6-104
    • docs/zh/introduction/concepts/incompatible-changes.md
  • Why this is a problem:
    • This PR is doing more than just moving existing failures earlier. It also tightens several submission-time contracts that can now reject configs that previously passed.
    • Concrete examples on the current head include duplicate-column rejection in DefineSinkType, max_field_length > 0 in FieldEncrypt, and the restored conditional-required checks in DynamicCompile.
    • Per the project's compatibility policy, those user-visible rejections need to be recorded in the incompatible-changes docs with migration guidance. That documentation is still missing.
  • Risk:
    • Users upgrading into this behavior will hit new submission-time failures without release-note coverage or a clear migration path.
    • It also makes it harder for maintainers to distinguish an intentional contract tightening from an accidental regression later on.
  • Best fix:
    • Add entries to both docs/en/introduction/concepts/incompatible-changes.md and docs/zh/introduction/concepts/incompatible-changes.md.
    • Please spell out which transforms changed, which old configs are now rejected, and how users should update those configs.
  • Severity: High
  • Raised by others already: Yes. This is directly related to @zhangshenghang's earlier compatibility note; I am adding the code-level evidence for the current head.

Issue 2: the ConditionExtension aggregation cleanup is still only partial

  • Location:
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/adaptsink/DefineSinkTypeTransformFactory.java:67-99
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/jsonpath/JsonPathTransformFactory.java:65-102
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/regexextract/RegexExtractTransformFactory.java:67-87
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/validator/DataValidatorTransformFactory.java:67-99
    • contract reference: docs/en/introduction/concepts/incompatible-changes.md:40-60
  • Why this matters:
    • The latest follow-up commit clearly tries to improve aggregation by converting CopyField and TableFilter to description() + return false.
    • But the factories above still throw OptionValidationException directly from ConditionExtension, so validation still aborts on the first extension error there.
    • In other words, the current head still has mixed behavior: plain constraints aggregate, but several extension validators remain fail-fast.
  • Risk:
    • The migrated transforms still do not present a consistent validation UX.
    • The commit message / PR scope now reads more complete than the actual behavior on the current head.
  • Best fix:
    • Option A: convert the remaining extension validators to boolean/description-style failures as well, so the aggregation behavior is actually consistent.
    • Option B: if that is outside the intended scope for this PR, narrow the stated scope and description so it does not imply the aggregation problem is solved across the board.
  • Severity: Medium
  • Raised by others already: Yes. This is related to @zhangshenghang's earlier aggregation point; I am confirming that it is still not fully closed on the latest head.

Tests and CI

  • The new DynamicCompileTransformFactoryTest.java:83-104 coverage is valuable and directly closes the missing-key regression proof I was looking for on that path.
  • The visible PR-level Build check is still shown as failed right now, but the check output points to the contributor fork run nzw921rx/seatunnel#27736094115, and that rerun is currently in progress. So I cannot call the current head green yet, but I also cannot attribute a fresh current-head failure to the transform code at this moment.

Merge conclusion

Conclusion: can merge after fixes

  1. Blockers
  • Issue 1 above: please document the newly tightened transform validation behavior as incompatible changes, with migration notes, before merge.
  1. Non-blocking suggestions
  • Issue 2 above: either finish the remaining aggregation cleanup in this PR, or narrow the claimed scope so it matches the actual implementation.

Overall, the DynamicCompile regression that @zhangshenghang flagged is fixed now, so this head is materially better than the one I reviewed before. The remaining blocker is no longer in the code path itself; it is the release / compatibility contract that still needs to be documented explicitly.

… document validation tightening as incompatible changes
@nzw921rx

Copy link
Copy Markdown
Collaborator Author

Thanks for the update. I re-reviewed the latest head from scratch.

+1 to @zhangshenghang's earlier DynamicCompile point: the conditional-required regression is fixed on the current head by DynamicCompileTransformFactory.java:42-57, and the new negative tests at DynamicCompileTransformFactoryTest.java:83-104 now cover the missing-key paths that were previously slipping through.

I still see one blocker before merge, plus one follow-up that is worth tightening up.

What this PR solves

  • Pain point: several transform plugins were still discovering config errors too late, with validation split across constructors, runtime logic, and ad hoc checks.
  • Approach: move more of that validation into declarative OptionRule rules and add focused factory-level tests around the migrated contracts.
  • One-line summary: the direction is good, but the current head still leaves a compatibility-documentation gap, and the new "ConditionExtension aggregation" cleanup is only partial.

Full runtime path I rechecked

job config
  -> transform factory optionRule()
  -> ConfigValidator.validate(optionRule)
      -> required / optional / conditional checks
      -> Conditions.notBlank / notEmpty / greaterThan
      -> Conditions.extension(..., ConditionExtension)

if submission-time validation passes
  -> createTransform(context)
      -> transform instance is created and enters the runtime path

current head specifics
  -> DynamicCompileTransformFactory [DynamicCompileTransformFactory.java:38-60]
      -> SOURCE_CODE / ABSOLUTE_PATH are again conditionally required
  -> CopyFieldTransformFactory / TableFilterTransformFactory
      -> now return false from the extension validator, which allows aggregation to continue
  -> DefineSinkType / JsonPath / RegexExtract / DataValidator
      -> still throw OptionValidationException directly from ConditionExtension

Findings

Issue 1: the newly tightened validation contracts are still undocumented as incompatible changes

  • Location:

    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/adaptsink/DefineSinkTypeTransformFactory.java:46-55,79-96
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/FieldEncryptTransformFactory.java:40-55
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/dynamiccompile/DynamicCompileTransformFactory.java:38-60
    • docs/en/introduction/concepts/incompatible-changes.md:6-104
    • docs/zh/introduction/concepts/incompatible-changes.md
  • Why this is a problem:

    • This PR is doing more than just moving existing failures earlier. It also tightens several submission-time contracts that can now reject configs that previously passed.
    • Concrete examples on the current head include duplicate-column rejection in DefineSinkType, max_field_length > 0 in FieldEncrypt, and the restored conditional-required checks in DynamicCompile.
    • Per the project's compatibility policy, those user-visible rejections need to be recorded in the incompatible-changes docs with migration guidance. That documentation is still missing.
  • Risk:

    • Users upgrading into this behavior will hit new submission-time failures without release-note coverage or a clear migration path.
    • It also makes it harder for maintainers to distinguish an intentional contract tightening from an accidental regression later on.
  • Best fix:

    • Add entries to both docs/en/introduction/concepts/incompatible-changes.md and docs/zh/introduction/concepts/incompatible-changes.md.
    • Please spell out which transforms changed, which old configs are now rejected, and how users should update those configs.
  • Severity: High

  • Raised by others already: Yes. This is directly related to @zhangshenghang's earlier compatibility note; I am adding the code-level evidence for the current head.

Issue 2: the ConditionExtension aggregation cleanup is still only partial

  • Location:

    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/adaptsink/DefineSinkTypeTransformFactory.java:67-99
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/jsonpath/JsonPathTransformFactory.java:65-102
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/regexextract/RegexExtractTransformFactory.java:67-87
    • seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/validator/DataValidatorTransformFactory.java:67-99
    • contract reference: docs/en/introduction/concepts/incompatible-changes.md:40-60
  • Why this matters:

    • The latest follow-up commit clearly tries to improve aggregation by converting CopyField and TableFilter to description() + return false.
    • But the factories above still throw OptionValidationException directly from ConditionExtension, so validation still aborts on the first extension error there.
    • In other words, the current head still has mixed behavior: plain constraints aggregate, but several extension validators remain fail-fast.
  • Risk:

    • The migrated transforms still do not present a consistent validation UX.
    • The commit message / PR scope now reads more complete than the actual behavior on the current head.
  • Best fix:

    • Option A: convert the remaining extension validators to boolean/description-style failures as well, so the aggregation behavior is actually consistent.
    • Option B: if that is outside the intended scope for this PR, narrow the stated scope and description so it does not imply the aggregation problem is solved across the board.
  • Severity: Medium

  • Raised by others already: Yes. This is related to @zhangshenghang's earlier aggregation point; I am confirming that it is still not fully closed on the latest head.

Tests and CI

  • The new DynamicCompileTransformFactoryTest.java:83-104 coverage is valuable and directly closes the missing-key regression proof I was looking for on that path.
  • The visible PR-level Build check is still shown as failed right now, but the check output points to the contributor fork run nzw921rx/seatunnel#27736094115, and that rerun is currently in progress. So I cannot call the current head green yet, but I also cannot attribute a fresh current-head failure to the transform code at this moment.

Merge conclusion

Conclusion: can merge after fixes

  1. Blockers
  • Issue 1 above: please document the newly tightened transform validation behavior as incompatible changes, with migration notes, before merge.
  1. Non-blocking suggestions
  • Issue 2 above: either finish the remaining aggregation cleanup in this PR, or narrow the claimed scope so it matches the actual implementation.

Overall, the DynamicCompile regression that @zhangshenghang flagged is fixed now, so this head is materially better than the one I reviewed before. The remaining blocker is no longer in the code path itself; it is the release / compatibility contract that still needs to be documented explicitly.

@DanielLeens Thanks for the review, The OptionValidationException thrown by extension operators is now handled uniformly at the framework layer in PR #11121.

@DanielLeens DanielLeens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up. I re-checked the latest head with the docs update and the remaining transform-factory validation path, and I do not see a blocker from the current source state.

What this PR fixes

  • User pain: Transform validation was inconsistent and spread across different imperative paths, and the behavior change also needed explicit incompatible-change documentation.
  • Fix approach: The PR migrates many transform factories to declarative validation and now also adds the missing incompatible-change docs in both languages.
  • One-line summary: The missing documentation from the earlier round is now in place, and the current head looks mergeable from the source side.

Runtime path I traced

Job submission
  -> transform-factory `optionRule()` paths
  -> `RegexExtractTransformFactory.DefaultValuesLengthValidator` returns `false` on mismatch
  -> `ConfigValidator` aggregates the failure during submission

Docs path
  -> incompatible-change docs were added in both English and Chinese

Key findings

  • The missing docs from the earlier round are now in place.
  • The current validator style matches the shared aggregated-validation model.
  • I do not see a new blocker on the current head.

Blocking items
None at the source-code level on the current head.

Other review notes

  • No additional non-blocking source-level comments from this round.

Compatibility and side effects

  • Compatibility: The behavior change is now documented, and I do not see a source-level blocker on the current head.
  • Side effects: No meaningful resource-model concern from the latest head.
  • Error handling / logging: The validation path still uses the shared aggregated-error flow.
  • Tests: The PR has broad validation coverage already, and I did not see a new flaky pattern in the current head.
  • Docs: The incompatible-change docs are now present in both English and Chinese.
  • CI: Checks were still queued when I reviewed.

Merge verdict

  • I do not see a blocker on the current head. The missing incompatible-change documentation from the earlier round is now in place, and the current validation changes look reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants