Skip to content

[SPARK-53478] Resolve SparkFiles.get against root directory when job artifact UUID is set in local mode#56623

Open
wilmerdooley wants to merge 3 commits into
apache:masterfrom
wilmerdooley:oss/spark-53478
Open

[SPARK-53478] Resolve SparkFiles.get against root directory when job artifact UUID is set in local mode#56623
wilmerdooley wants to merge 3 commits into
apache:masterfrom
wilmerdooley:oss/spark-53478

Conversation

@wilmerdooley

@wilmerdooley wilmerdooley commented Jun 19, 2026

Copy link
Copy Markdown

What changes were proposed in this pull request?

When running in local mode with a non-default job artifact UUID, SparkFiles.get resolved filenames against the per-session artifact directory, while files added via SparkContext.addFile were placed directly under the root directory. This made such files inaccessible from SQL-planned operations even though they were visible to code that ran outside a SQL session. This change falls back to the root directory when the job-specific path does not exist in local mode, mirroring the same behavior on the Scala, JVM Python worker, and PySpark Python sides.

  • core/src/main/scala/org/apache/spark/SparkFiles.scala: in get, if the job-specific path does not exist and the master is local, fall back to the file directly under getRootDirectory().
  • core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: export SPARK_LOCAL_MODE to Python workers so they can apply the same fallback.
  • python/pyspark/core/files.py: in SparkFiles.get, when running on a worker, the file is missing under the current root, the master is local, and the job artifact UUID is non-default, fall back to the file under the parent of the current root directory.

JIRA: https://issues.apache.org/jira/browse/SPARK-53478

Why are the changes needed?

In local mode with a non-default job artifact UUID, files added through SparkContext.addFile could not be resolved by SparkFiles.get from inside SQL-planned operations. SparkContext.addFile writes the file directly under the root directory, but SparkFiles.get looked under the per-session artifact directory, so the lookup missed the file and the path was unusable. The fallback restores the ability to read those files from SQL operations in local mode, while keeping the lookup scoped to local mode so session isolation semantics on real executors are unchanged.

Does this PR introduce any user-facing change?

Yes. In local mode with a non-default job artifact UUID, a file added via SparkContext.addFile is now resolvable through SparkFiles.get from SQL-planned operations, whereas before it was not found. There is no change on real executors or when the default artifact UUID is in use.

How was this patch tested?

Added regression tests that add a file via SparkContext.addFile and read it back through SparkFiles.get from a SQL-planned operation, on both the Scala and Python sides:

  • sql/core/src/test/scala/org/apache/spark/sql/artifact/ArtifactManagerSuite.scala: SPARK-53478: SparkFiles.get resolves files added via SparkContext.addFile in local mode.
  • python/pyspark/sql/tests/test_artifact.py: test_spark_files_get_with_sc_add_file.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…le and SparkFiles.get in local

Signed-off-by: wilmerdooley <wilmerdooley1@gmail.com>
wilmerdooley and others added 2 commits June 19, 2026 17:39
test_spark_files_get_with_sc_add_file ran its job on the shared default session of this ReusedSQLTestCase class, so when the full test_artifact module runs in CI order an earlier test that registers a my_pyfile.py artifact leaves the executor with a stale copy and the task fails. Run the verification on self.spark.newSession(), like the sibling add-file tests, which keeps the SparkContext.addFile resolution under test while isolating it from prior tests' session artifacts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant