Skip to content

orca: fall back for replicated CTE consumed in correlated scalar subqueries#389

Open
Alena0704 wants to merge 1 commit into
OPENGPDB_STABLEfrom
fix-cross-slice
Open

orca: fall back for replicated CTE consumed in correlated scalar subqueries#389
Alena0704 wants to merge 1 commit into
OPENGPDB_STABLEfrom
fix-cross-slice

Conversation

@Alena0704

@Alena0704 Alena0704 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

orca: fall back for replicated CTE consumed in correlated scalar subqueries

Root cause is a blind spot in the #375 slice walker: CollectCTESlices delimits slices only at Motion nodes. A CTE over a DISTRIBUTED REPLICATED table referenced from correlated scalar subqueries is decorrelated by ORCA into CPhysicalCorrelated*NLJoin whose inner side becomes an executor SubPlan running in its own slice -- but there is no Motion at that boundary, so the walker placed the Consumer on the same slice as the Producer. The cross-slice check (prod->sliceId != cons->sliceId) never fired, no fallback happened, and the ShareInputScan writer hung forever in shareinput_writer_waitdone() waiting for DONE acks from reader slices that never run.

Teach the walker that the inner (subquery) side of a correlated NL join is a slice boundary too, mirroring the Motion rule. The replicated Consumer in the SubPlan then gets a distinct slice id, the existing check fires, and ORCA falls back to the Postgres optimizer.

…ueries

Root cause is a blind spot in the #375 slice walker: CollectCTESlices
delimits slices only at Motion nodes. A CTE over a DISTRIBUTED REPLICATED
table referenced from correlated scalar subqueries is decorrelated by
ORCA into CPhysicalCorrelated*NLJoin whose inner side becomes an executor
SubPlan running in its own slice -- but there is no Motion at that
boundary, so the walker placed the Consumer on the same slice as the
Producer. The cross-slice check (prod->sliceId != cons->sliceId) never
fired, no fallback happened, and the ShareInputScan writer hung forever
in shareinput_writer_waitdone() waiting for DONE acks from reader slices
that never run.

Teach the walker that the inner (subquery) side of a correlated NL join
is a slice boundary too, mirroring the Motion rule. The replicated
Consumer in the SubPlan then gets a distinct slice id, the existing check
fires, and ORCA falls back to the Postgres optimizer.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an ORCA hang by improving the CTE slice walker so it also treats the inner (SubPlan) side of correlated nested-loop joins as a slice boundary, enabling the existing cross-slice replicated-CTE detection to trigger a fallback to the Postgres optimizer in this scenario.

Changes:

  • Update ORCA’s CTE slice collection logic to assign a distinct slice id to the inner side of correlated NL joins (SubPlan boundary), not just Motion boundaries.
  • Add a regression test reproducer covering replicated CTEs referenced from correlated scalar subqueries (and expected outputs for both optimizer modes).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/backend/gporca/libgpopt/src/base/CUtils.cpp Teach the slice walker to treat correlated-NLJ inner/SubPlan as a slice boundary to detect cross-slice replicated CTE consumers and force fallback.
src/test/regress/sql/shared_scan.sql Add a regression test query that would previously hang, guarded by statement_timeout.
src/test/regress/expected/shared_scan.out Expected output for the new shared_scan regression case.
src/test/regress/expected/shared_scan_optimizer.out Expected output for the new shared_scan regression case under optimizer settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +925 to +942
// True if the operator is a correlated NL join. Its inner side becomes
// an executor SubPlan that runs in its own slice, so a CTE Consumer
// there is cross-slice w.r.t. a Producer outside -- which can deadlock
// the ShareInputScan writer. We treat the inner side as a slice
// boundary so the check below catches it.
//
// These are all of ORCA's SubPlan-producing operators. Add new ones here.
static BOOL
FCorrelatedNLJoin(COperator *pop)
{
COperator::EOperatorId eopid = pop->Eopid();
return (COperator::EopPhysicalCorrelatedInnerNLJoin == eopid ||
COperator::EopPhysicalCorrelatedLeftOuterNLJoin == eopid ||
COperator::EopPhysicalCorrelatedLeftSemiNLJoin == eopid ||
COperator::EopPhysicalCorrelatedInLeftSemiNLJoin == eopid ||
COperator::EopPhysicalCorrelatedLeftAntiSemiNLJoin == eopid ||
COperator::EopPhysicalCorrelatedNotInLeftAntiSemiNLJoin == eopid);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants