[SPARK-56773][CORE][TESTS][FOLLOWUP] Always allocate INJECT_SHUFFLE_FETCH_FAILURES injection maps#56631
Open
cloud-fan wants to merge 1 commit into
Open
[SPARK-56773][CORE][TESTS][FOLLOWUP] Always allocate INJECT_SHUFFLE_FETCH_FAILURES injection maps#56631cloud-fan wants to merge 1 commit into
cloud-fan wants to merge 1 commit into
Conversation
…ETCH_FAILURES injection maps ### What changes were proposed in this pull request? Followup to apache#55738. The three `INJECT_SHUFFLE_FETCH_FAILURES` injection-state maps in `DAGScheduler` (`injectShuffleFetchFailuresCorruptedAttempt`, `injectShuffleFetchFailuresPendingDelayedCorruption`, `injectShuffleFetchFailuresDownstreamSuccessCount`) are initialized with `if (Utils.isTesting) new ConcurrentHashMap else null`. This patch allocates them unconditionally instead. ### Why are the changes needed? The cleanup site in `cleanupStateForJobAndIndependentStages` (and the other use-sites) re-evaluate `if (Utils.isTesting)` to decide whether to dereference these maps. `Utils.isTesting` reads the mutable `spark.testing` system property, so it can return a different value when the `DAGScheduler` is constructed than at the later use-sites. If it does, a `null` map is dereferenced and the `DAGScheduler` event loop crashes with a `NullPointerException`. Allocating the maps unconditionally makes the construction guard and the use-site guards consistent, so they can never disagree. ### Does this PR introduce _any_ user-facing change? No. The maps are only ever populated inside the config-gated `INJECT_SHUFFLE_FETCH_FAILURES` test paths, so in production they stay empty and there is no behavioral change beyond allocating an empty map. ### How was this patch tested? Existing tests. This is a defensive initialization change; the maps stay empty unless the existing `INJECT_SHUFFLE_FETCH_FAILURES` test paths populate them, so the existing coverage exercises them unchanged. ### Was this patch authored or co-authored using generative AI tooling? Yes, drafted with Claude Code (Anthropic).
uros-b
approved these changes
Jun 20, 2026
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Followup to #55738.
The three
INJECT_SHUFFLE_FETCH_FAILURESinjection-state maps inDAGScheduler(injectShuffleFetchFailuresCorruptedAttempt,injectShuffleFetchFailuresPendingDelayedCorruption,injectShuffleFetchFailuresDownstreamSuccessCount) are initialized withif (Utils.isTesting) new ConcurrentHashMap else null. This patch allocates them unconditionally instead.Why are the changes needed?
The cleanup site in
cleanupStateForJobAndIndependentStages(and the other use-sites) re-evaluateif (Utils.isTesting)to decide whether to dereference these maps.Utils.isTestingreads the mutablespark.testingsystem property, so it can return a different value when theDAGScheduleris constructed than at the later use-sites. If it does, anullmap is dereferenced and theDAGSchedulerevent loop crashes with aNullPointerException. Allocating the maps unconditionally makes the construction guard and the use-site guards consistent, so they can never disagree.Does this PR introduce any user-facing change?
No. The maps are only ever populated inside the config-gated
INJECT_SHUFFLE_FETCH_FAILUREStest paths, so in production they stay empty and there is no behavioral change beyond allocating an empty map.How was this patch tested?
Existing tests. This is a defensive initialization change; the maps stay empty unless the existing
INJECT_SHUFFLE_FETCH_FAILUREStest paths populate them, so the existing coverage exercises them unchanged.Was this patch authored or co-authored using generative AI tooling?
Yes, drafted with Claude Code (Anthropic).