Skip to content

Infrastructure to test DaCe's codegen (in)deterministic behavior#2590

Open
kotsaloscv wants to merge 10 commits into
mainfrom
dace_deterministic_codegen_test
Open

Infrastructure to test DaCe's codegen (in)deterministic behavior#2590
kotsaloscv wants to merge 10 commits into
mainfrom
dace_deterministic_codegen_test

Conversation

@kotsaloscv
Copy link
Copy Markdown
Contributor

@kotsaloscv kotsaloscv commented May 3, 2026

Adds infrastructure to detect non-determinism in gt4py's DaCe codegen.

The checker runs an icon4py test selection through nox twice with isolated
gt4py build caches, then byte-compares the generated sources under each
program's src/. Exit 0 = identical, exit 1 = different. Supports the
dace_cpu, dace_gpu (CUDA), and HIP backends.

Layout

  • ci/dace_deterministic_codegen/dace_deterministic_codegen.py — the
    comparison script. Valid --selection/--component values are read
    from icon4py's noxfile.py at runtime, so it auto-tracks upstream
    changes.
  • ci/dace_deterministic_codegen/bootstrap_icon4py.py — patches icon4py's
    [tool.uv.sources] to install the editable gt4py (and optionally a
    custom dace branch) into the nox session venv.
  • ci/dace_deterministic_codegen/run_in_ci.sh — env-var-driven driver
    (clone + bootstrap + check) used by CI and reproducible locally.
  • ci/dace_deterministic_codegen/README.md — setup, flags, exit codes,
    examples, local-repro recipe.

CI

New dace-determinism stage in ci/cscs-ci-dace-determinism.yml, included
from cscs-ci.yml. Two jobs on the santis gh200 runner (CUDA + CPU), each
a 4-cell matrix over component {dycore, advection, diffusion, muphys}
with selection=stencils and grid=icon_regional. allow_failure: true
while the toolchain stabilizes — to be dropped once main stays green.

Setting DACE_REPO/DACE_REF points a run at an unmerged dace branch
(e.g. the deterministic-codegen work); when both are empty, dace resolves
through icon4py's existing pin.

Per-cell artifacts (run1/run2 caches, diffs/, report.txt) are
retained for 1 month.

@kotsaloscv kotsaloscv self-assigned this May 3, 2026
@kotsaloscv kotsaloscv marked this pull request as ready for review May 8, 2026 08:22
@kotsaloscv kotsaloscv requested a review from tehrengruber May 8, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant