Skip to content

test_gpu_orbit_bench is flaky: exact loss-step equality between CPU and GPU paths is not a stable criterion #380

@krystophny

Description

@krystophny

Observation

The CI run for commit abf6ca9 passed Build and Test at 2026-06-10T13:30 (run 27279802255) and failed at 2026-06-10T15:00 (run 27285400079) with no change to code or dependencies. The failing test was test_gpu_orbit_bench:

max |z_cpu - z_gpu| (final state) =   5.2859E+00
loss-step mismatches = 1 / 8
CPU lost = 1 / 8   confined frac =  0.8750
GPU lost = 1 / 8   confined frac =  0.8750
lost<->confined flips = 0 / 8

Why it is flaky

The test passes only on loss-step mismatches = 0 (PASS_REGULAR_EXPRESSION in test/tests/CMakeLists.txt). The CPU reference (procedure-pointer dispatch, OpenMP) and the GPU kernel (separately compiled gpu_timestep_euler) are numerically different code paths; their trajectories diverge chaotically over the 51-macrostep trace (final-state difference is O(1) even on passing runs, e.g. 2.76 locally). For a particle that is marginal against the s=1 loss boundary, the macrostep at which it crosses is then effectively a coin flip that depends on the runner's FP environment. In the failing run above both paths lose the same particle (flips = 0) but at different macrosteps.

Suggested fix

Keep the strong checks where they are well-posed and drop the ill-posed one:

  • short-horizon equivalence: after a few microsteps, require max |z_cpu - z_gpu| below a tight tolerance (catches genuine kernel bugs before chaos amplifies),
  • long-horizon statistics: require lost<->confined flips = 0 (classification agreement) instead of exact loss-step equality.

Until then the test will fail sporadically on any marginal particle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions