orca: fall back for replicated CTE consumed in correlated scalar subqueries#389
Open
Alena0704 wants to merge 1 commit into
Open
orca: fall back for replicated CTE consumed in correlated scalar subqueries#389Alena0704 wants to merge 1 commit into
Alena0704 wants to merge 1 commit into
Conversation
…ueries Root cause is a blind spot in the #375 slice walker: CollectCTESlices delimits slices only at Motion nodes. A CTE over a DISTRIBUTED REPLICATED table referenced from correlated scalar subqueries is decorrelated by ORCA into CPhysicalCorrelated*NLJoin whose inner side becomes an executor SubPlan running in its own slice -- but there is no Motion at that boundary, so the walker placed the Consumer on the same slice as the Producer. The cross-slice check (prod->sliceId != cons->sliceId) never fired, no fallback happened, and the ShareInputScan writer hung forever in shareinput_writer_waitdone() waiting for DONE acks from reader slices that never run. Teach the walker that the inner (subquery) side of a correlated NL join is a slice boundary too, mirroring the Motion rule. The replicated Consumer in the SubPlan then gets a distinct slice id, the existing check fires, and ORCA falls back to the Postgres optimizer.
There was a problem hiding this comment.
Pull request overview
This PR fixes an ORCA hang by improving the CTE slice walker so it also treats the inner (SubPlan) side of correlated nested-loop joins as a slice boundary, enabling the existing cross-slice replicated-CTE detection to trigger a fallback to the Postgres optimizer in this scenario.
Changes:
- Update ORCA’s CTE slice collection logic to assign a distinct slice id to the inner side of correlated NL joins (SubPlan boundary), not just Motion boundaries.
- Add a regression test reproducer covering replicated CTEs referenced from correlated scalar subqueries (and expected outputs for both optimizer modes).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/backend/gporca/libgpopt/src/base/CUtils.cpp |
Teach the slice walker to treat correlated-NLJ inner/SubPlan as a slice boundary to detect cross-slice replicated CTE consumers and force fallback. |
src/test/regress/sql/shared_scan.sql |
Add a regression test query that would previously hang, guarded by statement_timeout. |
src/test/regress/expected/shared_scan.out |
Expected output for the new shared_scan regression case. |
src/test/regress/expected/shared_scan_optimizer.out |
Expected output for the new shared_scan regression case under optimizer settings. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+925
to
+942
| // True if the operator is a correlated NL join. Its inner side becomes | ||
| // an executor SubPlan that runs in its own slice, so a CTE Consumer | ||
| // there is cross-slice w.r.t. a Producer outside -- which can deadlock | ||
| // the ShareInputScan writer. We treat the inner side as a slice | ||
| // boundary so the check below catches it. | ||
| // | ||
| // These are all of ORCA's SubPlan-producing operators. Add new ones here. | ||
| static BOOL | ||
| FCorrelatedNLJoin(COperator *pop) | ||
| { | ||
| COperator::EOperatorId eopid = pop->Eopid(); | ||
| return (COperator::EopPhysicalCorrelatedInnerNLJoin == eopid || | ||
| COperator::EopPhysicalCorrelatedLeftOuterNLJoin == eopid || | ||
| COperator::EopPhysicalCorrelatedLeftSemiNLJoin == eopid || | ||
| COperator::EopPhysicalCorrelatedInLeftSemiNLJoin == eopid || | ||
| COperator::EopPhysicalCorrelatedLeftAntiSemiNLJoin == eopid || | ||
| COperator::EopPhysicalCorrelatedNotInLeftAntiSemiNLJoin == eopid); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
orca: fall back for replicated CTE consumed in correlated scalar subqueries
Root cause is a blind spot in the #375 slice walker: CollectCTESlices delimits slices only at Motion nodes. A CTE over a DISTRIBUTED REPLICATED table referenced from correlated scalar subqueries is decorrelated by ORCA into CPhysicalCorrelated*NLJoin whose inner side becomes an executor SubPlan running in its own slice -- but there is no Motion at that boundary, so the walker placed the Consumer on the same slice as the Producer. The cross-slice check (prod->sliceId != cons->sliceId) never fired, no fallback happened, and the ShareInputScan writer hung forever in shareinput_writer_waitdone() waiting for DONE acks from reader slices that never run.
Teach the walker that the inner (subquery) side of a correlated NL join is a slice boundary too, mirroring the Motion rule. The replicated Consumer in the SubPlan then gets a distinct slice id, the existing check fires, and ORCA falls back to the Postgres optimizer.