Infrastructure to test DaCe's codegen (in)deterministic behavior#2590
Open
kotsaloscv wants to merge 10 commits into
Open
Infrastructure to test DaCe's codegen (in)deterministic behavior#2590kotsaloscv wants to merge 10 commits into
kotsaloscv wants to merge 10 commits into
Conversation
…CD integration [WIP]
…CD integration [WIP]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds infrastructure to detect non-determinism in gt4py's DaCe codegen.
The checker runs an icon4py test selection through
noxtwice with isolatedgt4py build caches, then byte-compares the generated sources under each
program's
src/. Exit 0 = identical, exit 1 = different. Supports thedace_cpu,dace_gpu(CUDA), and HIP backends.Layout
ci/dace_deterministic_codegen/dace_deterministic_codegen.py— thecomparison script. Valid
--selection/--componentvalues are readfrom icon4py's
noxfile.pyat runtime, so it auto-tracks upstreamchanges.
ci/dace_deterministic_codegen/bootstrap_icon4py.py— patches icon4py's[tool.uv.sources]to install the editable gt4py (and optionally acustom dace branch) into the nox session venv.
ci/dace_deterministic_codegen/run_in_ci.sh— env-var-driven driver(clone + bootstrap + check) used by CI and reproducible locally.
ci/dace_deterministic_codegen/README.md— setup, flags, exit codes,examples, local-repro recipe.
CI
New
dace-determinismstage inci/cscs-ci-dace-determinism.yml, includedfrom
cscs-ci.yml. Two jobs on the santis gh200 runner (CUDA + CPU), eacha 4-cell matrix over component
{dycore, advection, diffusion, muphys}with
selection=stencilsandgrid=icon_regional.allow_failure: truewhile the toolchain stabilizes — to be dropped once
mainstays green.Setting
DACE_REPO/DACE_REFpoints a run at an unmerged dace branch(e.g. the deterministic-codegen work); when both are empty, dace resolves
through icon4py's existing pin.
Per-cell artifacts (
run1/run2caches,diffs/,report.txt) areretained for 1 month.