[Improve][Transform-V2][Embedding]Enhance multimodal embeddings#9996
Conversation
2f8b47b to
bdbd012
Compare
|
@Hisoka-X @corgy-w @xiaochen-zhou help to review if have time, thanks |
| for (Map.Entry<String, Object> fieldConfig : fieldsConfig.entrySet()) { | ||
| VectorFieldSpec vectorFieldSpec = new VectorFieldSpec(fieldConfig); | ||
| log.info("Vector field spec: {}", vectorFieldSpec); | ||
| List<String> srcFieldNames = | ||
| vectorFieldSpec.getSrcFieldSpecs().stream() | ||
| .map(SrcFieldSpec::getFieldName) | ||
| .collect(Collectors.toList()); | ||
| List<Integer> srcFieldIndexes = new ArrayList<>(); | ||
| for (String srcFieldName : srcFieldNames) { | ||
| try { | ||
| srcFieldIndexes.add(inputRowType.indexOf(srcFieldName)); | ||
| } catch (IllegalArgumentException e) { | ||
| throw TransformCommonError.cannotFindInputFieldsError( | ||
| getPluginName(), srcFieldNames); | ||
| } | ||
| } | ||
| fieldSpecMap.put(srcFieldIndex, fieldSpec); | ||
| fieldNames.add(field.getKey()); | ||
| isMultimodalFields = vectorFieldSpec.isMultimodalField(); | ||
| fieldSpecMap.put(vectorFieldSpec, srcFieldIndexes); | ||
| fieldNames.add(vectorFieldSpec.getFieldName()); |
There was a problem hiding this comment.
isMultimodalFields = vectorFieldSpec.isMultimodalField();
There is a logical issue; currently, only the last value can be obtained.
There was a problem hiding this comment.
Thanks for your time, let me fix it
bdbd012 to
d497cba
Compare
0c8a06b to
5f5745e
Compare
|
Thank you for this excellent proposal! The refactoring to use During the review, I found a few data-corruption bugs and missing requirements that must be addressed before this can be merged. 1. Output Field Order Mismatch (
|
5f5745e to
a54db18
Compare
|
@davidzollo Thanks for your patient guidance and valuable suggestions ❤️. Let me fix these problems. |
b67b4d4 to
568fab0
Compare
There was a problem hiding this comment.
+1 if CI passes.
Good job,thanks for the efforts.
By the way, Testing with ThreadLocalRandom causes non-determinism; it is recommended to change to parameterized testing.
You can feel free to add my LinkedIn https://www.linkedin.com/in/davidzollo or WeChat(davidzollo) to build a connection ^_^
568fab0 to
59cecda
Compare
59cecda to
d7bcb01
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update. I re-reviewed the latest head against the full current PR diff and re-traced the actual multimodal runtime path from EmbeddingTransform into SrcField, MultimodalFieldValue, and DoubaoModel.
What this PR solves
- User pain: the Embedding transform could not cleanly express or assemble mixed text / image / video inputs into one embedding request, and previous revisions could also let runtime suffix detection override an explicitly configured modality.
- Fix approach: the PR splits the config/runtime responsibilities into
VectorFieldSpec,SrcFieldSpec, and per-rowSrcField, then builds the Doubao multimodal request body from those row-scoped source fields. - One-line summary: the latest head turns multimodal field assembly into a much healthier per-row flow, and I do not see a new source-level blocker on the current diff.
Runtime path rechecked
SeaTunnelRow enters EmbeddingTransform
-> initOutputFields() parses vectorization_fields into VectorFieldSpec + source indexes
-> getOutputFieldValues()
-> build per-row SrcField objects
-> SrcField.determineModalityType()
-> keep explicit modality as source of truth
-> only auto-detect from suffix when modality is unset and format=url
-> wrap multimodal outputs as MultimodalFieldValue
Doubao request assembly
-> DoubaoModel.multimodalVector()
-> multimodalVectorGeneration()
-> multimodalBody()
-> inputRawData()
-> text -> {"type":"text","text":...}
-> image/video url -> {"type":"image_url"/"video_url", ...}
-> binary -> base64 data URL
Key findings
- The earlier explicit-modality blocker is fixed on the current head.
SrcFieldnow clones the config spec per row and returns early indetermineModalityType()when the modality was explicitly configured or the payload is not URL-based. - The normal path does hit this PR:
EmbeddingTransform.getOutputFieldValues()now always builds row-scopedSrcFieldobjects before the request is forwarded toDoubaoModel. - The new regression coverage for explicit-modality preservation and text values ending with image-like suffixes is meaningful and matches the source fix.
- The current PR-level checks are green from GitHub's side (
Build,Notify test workflow,labeler).
Findings
No new blocking source-level issue from me on the latest head.
There is one non-blocking test-quality point:
ThreadLocalRandomis still used in the new test code- Location:
seatunnel-transforms-v2/src/test/java/org/apache/seatunnel/transform/embedding/DoubaoMultimodalModelTest.java:247seatunnel-transforms-v2/src/test/java/org/apache/seatunnel/transform/embedding/MultimodalConfigTest.java:209
- +1 to @davidzollo's earlier point about nondeterministic test construction. The current assertions are likely stable because both branches converge to the same expected result, so I would not block merge on this.
- The downside is that each run only exercises one config shape at random, which makes branch coverage and later failure reproduction harder than necessary.
- Better fix:
- Option A: turn those cases into parameterized tests and cover both config shapes deterministically.
- Option B: split them into two explicit test methods.
- Severity: Medium, non-blocking.
- Location:
Compatibility
- Fully compatible from what I can see.
- Existing single-field text configs still work.
- The PR extends
vectorization_fieldsto richer list/object forms without breaking the old string form. - I do not see an API, protocol, serialization, or historical-behavior break here.
Performance / side effects
- The PR adds small per-row object allocation (
SrcField) but removes the previous shared-state mutation risk. - The network cost increase for multimodal payloads is expected and comes from the feature itself, not from an accidental retry/logging/resource regression.
- I do not see a new concurrency hazard on the latest head; the row-scoped clone is actually safer than the previous shared-spec behavior.
Test coverage and stability
- Coverage: meaningful. The new UTs now cover explicit modality preservation, URL auto-detection, mixed-field text handling, and binary payload shaping.
- Stability rating: some risk, non-blocking.
- Reason: the new tests at the two locations above still choose one config shape randomly with
ThreadLocalRandom, which makes coverage nondeterministic even though the current assertions are not obviously flaky.
Merge conclusion
Conclusion: can merge
- Blocking items
- No new blocking issue from me on the latest head.
- Suggested follow-up
- Replace the
ThreadLocalRandom-driven test branches with deterministic parameterized or split test cases.
Overall, this revision is much cleaner than the earlier heads and the core multimodal runtime path now makes sense end to end. From the source-review side, I am okay with merging this version.
|
@davidzollo I fixed the issue of automatic detection mentioned above
|
|
Thanks for the update here. I checked the current PR head before replying, and it is still the same revision I reviewed previously ( From the description you posted, the direction sounds reasonable. Please push the actual code change to the PR branch, and once a new commit is there I will re-review the latest head end to end. |
bb8677d to
48816d7
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update. I re-reviewed the latest head from scratch, focusing on the net changes since Daniel’s previously approved revision.
What this latest head changes
- User pain: provider invocation settings and failure handling were still split across the embedding providers, especially in the Doubao path.
- Fix approach:
EmbeddingTransformnow passesModelInvocationOptionsdown into the provider models, andDoubaoModelroutes both text and multimodal calls through the shared invocation runtime with configurable request timeouts and structured error mapping. - One-line summary: the latest delta is mainly a runtime-hardening follow-up on top of the earlier multimodal work, not a new semantic rewrite of the embedding path.
Runtime chain rechecked
SeaTunnelRow
-> EmbeddingTransform.open() [EmbeddingTransform.java:90-170]
-> ModelInvocationOptions.fromConfig(config)
-> new DoubaoModel(..., invocationOptions)
vector request
-> DoubaoModel.textVector() / multimodalVector()
-> invocationRuntime.invoke(...)
-> requestTimeoutMs comes from ModelInvocationContext
-> HTTP failures -> ModelInvocationException.fromHttpStatus(...)
-> parse failures -> ModelInvocationException.nonRetryable(...)
Key findings
- The normal embedding path definitely hits this change because all provider instances are created from
EmbeddingTransform.open(). DoubaoModel.java:139-328now uses the shared invocation runtime instead of keeping fixed 20-second timeouts and ad-hoc exception handling inside the provider.- I do not see a reopened source-level blocker in the transform path on this latest head.
- The current Build is red, but the failing jobs I checked are connector / JDBC integration lanes rather than the transform runtime path.
transform-v2-it-part-1is green.
Findings
Issue 1: the PR-level Build is red on the latest head
- Location: GitHub CI
Build - Why it matters: even though the transform path itself looks fine to me, the PR still needs a clean Build result before merge.
- CI evidence I checked:
- failing lanes include
all-connectors-it-3,all-connectors-it-7,jdbc-connectors-it-part-5,jdbc-connectors-it-part-6, andjdbc-connectors-it-ddl transform-v2-it-part-1is passing
- failing lanes include
- Severity: Medium
Compatibility / side effects
- I do not see an API, protocol, serialization, or backward-compatibility break here.
- The latest changes make the provider invocation path more uniform and improve resource handling with
try (CloseableHttpResponse ...).
Test coverage / stability
- The latest net delta does not add new test code, so there is no new test-stability blocker to call out in this round.
Merge conclusion
Conclusion: can merge after fixes
- Blocking items
- Issue 1: please get the current
Buildgreen on this latest head.
- Suggested follow-up
- No new source-side change is required from this review.
Overall, I do not see a new code-side blocker in the embedding runtime path. What still blocks merge for me is the red PR-level Build, not the transform logic itself.
48816d7 to
afd0fd7
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update. I re-reviewed the latest head from scratch against the full current diff and re-traced the real multimodal runtime path from EmbeddingTransform into SrcField, MultimodalFieldValue, and DoubaoModel.
What this PR solves
- User pain: the transform needed a cleaner way to express mixed text / image / video inputs, and the provider invocation path still had provider-local timeout / error handling that was harder to reason about.
- Fix approach: this revision splits config-time vector specs from row-time source fields, keeps modality detection row-local, and routes the Doubao text / multimodal calls through the shared invocation runtime.
- One-line summary: the current head makes the multimodal embedding path much healthier, and I do not see a new source-level blocker in the latest code.
Runtime path I checked
config parsing
-> EmbeddingTransform.initOutputFields(...) [EmbeddingTransform.java:206-237]
-> VectorFieldSpec(fieldConfig)
-> one or more SrcFieldSpec entries per output vector field
row-time execution
-> EmbeddingTransform.getOutputFieldValues(...) [EmbeddingTransform.java:241-268]
-> new SrcField(spec, rowValue) [SrcField.java:34-44]
-> determineModalityType() [SrcField.java:46-63]
-> keep explicit modality as source of truth
-> only auto-detect from suffix when modality is unset and format=url
-> wrap row-scoped multimodal inputs as MultimodalFieldValue when needed
provider call
-> model.vectorization(...)
-> DoubaoModel.textVector(...) / multimodalVector(...) [DoubaoModel.java:137-183]
-> invocationRuntime.invoke(...)
-> multimodalVectorGeneration(..., requestTimeoutMs) [DoubaoModel.java:286-323]
Key findings
- The normal transform path definitely hits this change, because both
EmbeddingTransform.open()andgetOutputFieldValues(...)are part of the main row-processing path. - +1 to @davidzollo's earlier point from the older rounds about explicit modality ownership. I traced the same path on the current head, and that blocker is fixed now:
SrcField.determineModalityType()returns early when the modality was explicitly configured or the payload is not URL-based, so the runtime suffix no longer overwrites explicit user config. - The row-local
SrcFieldconstruction also keeps runtime modality inference from leaking back into shared config state. - I did not find a new source-level blocker in the current head.
Other reviewer / maintainer input
- I noticed @davidzollo had previously raised the explicit-modality override concern. After rechecking
SrcField.java:46-63and the current tests, I agree that this point was important, and I also agree it is fixed in the latest head.
Testing / stability
- The new source-level regression coverage is meaningful: config parsing, explicit-modality preservation, mixed text/image/video request assembly, and binary payload shaping are all covered.
- There is still one non-blocking test-quality follow-up:
ThreadLocalRandomis still used inDoubaoMultimodalModelTest.java:247andMultimodalConfigTest.java:209, which makes branch coverage nondeterministic even though the current assertions are not obviously flaky. - Stability rating for the new tests: some risk, non-blocking.
CI / merge gate
- The PR-level
Buildcheck currently still showscompleted / failureon headafd0fd7b25fb4fa0fd2e1baa2ecd6e1f9545210e. - I followed the linked workflow metadata from that check run, and it points to
loupipalien/seatunnelworkflow run28330521110, which was stillin_progresswhen I checked. - The already-finished transform-relevant lanes I saw there were green, including
transform-v2-it-part-1,unit-test (11, ubuntu-latest),Dead links, andCode style. - So from the evidence currently visible to me, I do not have a transform-path code failure to point to, but I also would not treat the PR as workflow-clean until that Build state settles.
Merge conclusion: can merge after fixes
- Blocking items
- Please get the current
Buildsignal into a stable green state before merging. If the underlying workflow finishes green but the PR check still stays red, I would treat that as a stale / sync issue and rerun or refresh the aggregate check.
- Suggested follow-up
- Replace the
ThreadLocalRandom-driven test branches with deterministic parameterized or split tests.
Overall, the latest revision is in much better shape from the actual embedding runtime path. I do not see a new source-side blocker on this head; what still stops me from calling it merge-ready is the unsettled PR-level Build status rather than the transform code itself.
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the same head afd0fd7b25fb4fa0fd2e1baa2ecd6e1f9545210e, and the only blocker from my previous pass was the unsettled Build state rather than a transform-path code issue.
The checks on this unchanged head are green now, and my source-level conclusion is unchanged: I still do not see a blocker in the multimodal embedding runtime path I traced (EmbeddingTransform -> SrcField / MultimodalFieldValue -> DoubaoModel).
Approving from my side.
Purpose of this pull request
Multi-field multimodal vectorization, doubao-embedding-vision supports multi-field multimodal mixing as input
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Add new test cases
Check list
New License Guide