[SPARK-53478] Resolve SparkFiles.get against root directory when job artifact UUID is set in local mode by wilmerdooley · Pull Request #56623 · apache/spark

wilmerdooley · 2026-06-19T21:57:37Z

What changes were proposed in this pull request?

When running in local mode with a non-default job artifact UUID, SparkFiles.get resolved filenames against the per-session artifact directory, while files added via SparkContext.addFile were placed directly under the root directory. This made such files inaccessible from SQL-planned operations even though they were visible to code that ran outside a SQL session. This change falls back to the root directory when the job-specific path does not exist in local mode, mirroring the same behavior on the Scala, JVM Python worker, and PySpark Python sides.

core/src/main/scala/org/apache/spark/SparkFiles.scala: in get, if the job-specific path does not exist and the master is local, fall back to the file directly under getRootDirectory().
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: export SPARK_LOCAL_MODE to Python workers so they can apply the same fallback.
python/pyspark/core/files.py: in SparkFiles.get, when running on a worker, the file is missing under the current root, the master is local, and the job artifact UUID is non-default, fall back to the file under the parent of the current root directory.

JIRA: https://issues.apache.org/jira/browse/SPARK-53478

Why are the changes needed?

In local mode with a non-default job artifact UUID, files added through SparkContext.addFile could not be resolved by SparkFiles.get from inside SQL-planned operations. SparkContext.addFile writes the file directly under the root directory, but SparkFiles.get looked under the per-session artifact directory, so the lookup missed the file and the path was unusable. The fallback restores the ability to read those files from SQL operations in local mode, while keeping the lookup scoped to local mode so session isolation semantics on real executors are unchanged.

Does this PR introduce any user-facing change?

Yes. In local mode with a non-default job artifact UUID, a file added via SparkContext.addFile is now resolvable through SparkFiles.get from SQL-planned operations, whereas before it was not found. There is no change on real executors or when the default artifact UUID is in use.

How was this patch tested?

Added regression tests that add a file via SparkContext.addFile and read it back through SparkFiles.get from a SQL-planned operation, on both the Scala and Python sides:

sql/core/src/test/scala/org/apache/spark/sql/artifact/ArtifactManagerSuite.scala: SPARK-53478: SparkFiles.get resolves files added via SparkContext.addFile in local mode.
python/pyspark/sql/tests/test_artifact.py: test_spark_files_get_with_sc_add_file.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…le and SparkFiles.get in local Signed-off-by: wilmerdooley <wilmerdooley1@gmail.com>

test_spark_files_get_with_sc_add_file ran its job on the shared default session of this ReusedSQLTestCase class, so when the full test_artifact module runs in CI order an earlier test that registers a my_pyfile.py artifact leaves the executor with a stale copy and the task fails. Run the verification on self.spark.newSession(), like the sibling add-file tests, which keeps the SparkContext.addFile resolution under test while isolating it from prior tests' session artifacts.

[SPARK-53478] Inconsistent file resolution between SparkContext.addFi…

963a3a2

…le and SparkFiles.get in local Signed-off-by: wilmerdooley <wilmerdooley1@gmail.com>

wilmerdooley mentioned this pull request Jun 19, 2026

[SPARK-46166][PS] Implementation of pandas.DataFrame.any with axis=None #53478

Closed

wilmerdooley and others added 2 commits June 19, 2026 17:39

Trigger CI

5d31a63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-53478] Resolve SparkFiles.get against root directory when job artifact UUID is set in local mode#56623

[SPARK-53478] Resolve SparkFiles.get against root directory when job artifact UUID is set in local mode#56623
wilmerdooley wants to merge 3 commits into
apache:masterfrom
wilmerdooley:oss/spark-53478

wilmerdooley commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wilmerdooley commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wilmerdooley commented Jun 19, 2026 •

edited

Loading