[AutoSparkUT] Add RapidsDataFrameJoinSuite#14654
[AutoSparkUT] Add RapidsDataFrameJoinSuite#14654wjxiz1992 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Migrates Spark DataFrameJoinSuite (19 tests) to RAPIDS using the
minimal-inheritance pattern:
class RapidsDataFrameJoinSuite
extends DataFrameJoinSuite with RapidsSQLTestsTrait {}
Contributes test coverage for the core join workload on GPU,
complementing the existing RapidsJoinSuite.
Local Maven validation (spark330 shim, GPU allocFraction=0.3):
Run starting. Expected test count is: 19
RapidsDataFrameJoinSuite:
Tests: succeeded 18, failed 0, canceled 0, ignored 1, pending 0
One test excluded as KNOWN_ISSUE:
SPARK-24690 enables star schema detection even if CBO disabled
-> NVIDIA#14653
Root cause is in the proprietary JoinReorderRule logical-plan rule
(in rapids-4-spark-private_2.12), which violates the Catalyst
structural-integrity invariant when reordering a 4-way star join
under STARSCHEMA_DETECTION=true + CBO_ENABLED=false +
PLAN_STATS_ENABLED=true. The failure occurs at the LogicalPlan
optimization stage, before any physical planning, so CPU fallback
cannot recover.
Contributes to NVIDIA#14653.
Signed-off-by: Allen Xu <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds a Spark 3.3.0 RAPIDS GPU test wrapper for Spark’s DataFrameJoinSuite, registering it in the spark330 test settings with a single KNOWN_ISSUE exclusion tied to an existing optimizer-rule bug.
Changes:
- Introduce
RapidsDataFrameJoinSuiteas a minimal-inheritance wrapper (DataFrameJoinSuite+RapidsSQLTestsTrait) for Spark 3.3.0. - Register the new suite in
RapidsTestSettingsand exclude the failingSPARK-24690test viaKNOWN_ISSUE(https://github.com/NVIDIA/spark-rapids/issues/14653).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala | Enables the new join suite for spark330 and applies a single known-issue exclusion. |
| tests/src/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameJoinSuite.scala | Adds the GPU wrapper suite extending Spark’s DataFrameJoinSuite with RapidsSQLTestsTrait. |
Greptile SummaryAdds Confidence Score: 5/5Safe to merge; the change is purely additive test infrastructure following established project patterns. Both files are boilerplate wrappers consistent with existing suites. The only finding is a P2 placement style note — no correctness, data, or reliability concerns. No files require special attention. Important Files Changed
Class Diagram%%{init: {'theme': 'neutral'}}%%
classDiagram
class DataFrameJoinSuite {
+test("join - join using")
+test("join - join using multiple columns")
+test("join - cross join")
+test("broadcast join hint")
+test("SPARK-24690 ...") ~~excluded~~
+... 14 more tests
}
class RapidsSQLTestsTrait {
+sparkConf: SparkConf
+GPU config overrides
+beforeAll()
+afterAll()
}
class RapidsDataFrameJoinSuite {
shim: spark330
}
DataFrameJoinSuite <|-- RapidsDataFrameJoinSuite
RapidsSQLTestsTrait <|.. RapidsDataFrameJoinSuite
class RapidsTestSettings {
+enableSuite[RapidsDataFrameJoinSuite]
+exclude("SPARK-24690", KNOWN_ISSUE)
}
RapidsTestSettings --> RapidsDataFrameJoinSuite : registers
Reviews (1): Last reviewed commit: "[AutoSparkUT] Add RapidsDataFrameJoinSui..." | Re-trigger Greptile |
Summary
Migrates Spark
DataFrameJoinSuite(19 tests) to RAPIDS using the minimal-inheritance pattern:Closes Tier-A1 of our local suite-migration queue (core join workload — complements
RapidsJoinSuite).spark/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scalaat Spark 3.3.0 (tagv3.3.0), full class (19test(...)blocks). Pinned permalink · master referencetests/src/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameJoinSuite.scalaRapidsTestSettings.scalawith one KNOWN_ISSUE exclusion.Per-test mapping (Spark 3.3.0)
All 19 tests come from the parent class unchanged. Line ranges below are for
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scalaat tagv3.3.0:Exclusion
One test is excluded as
KNOWN_ISSUE:SPARK-24690 enables star schema detection even if CBO disabled—KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/14653")Root cause: the proprietary
com.nvidia.spark.rapids.optimizer.JoinReorderRulerule (shipped inrapids-4-spark-private_2.12) violates the Catalyst LogicalPlan structural-integrity invariant when reordering a 4-way star join underSTARSCHEMA_DETECTION=true+CBO_ENABLED=false+PLAN_STATS_ENABLED=true. The failure occurs in the "Operator Optimization before Inferring Filters" batch (LogicalPlan optimization), so CPU fallback cannot recover — the plan never reaches physical planning.Contributes to #14653.
Local Maven validation
Result (with exclusion applied):
Checklist
Documentation
Testing
Performance