Observation
The CI run for commit abf6ca9 passed Build and Test at 2026-06-10T13:30 (run 27279802255) and failed at 2026-06-10T15:00 (run 27285400079) with no change to code or dependencies. The failing test was test_gpu_orbit_bench:
max |z_cpu - z_gpu| (final state) = 5.2859E+00
loss-step mismatches = 1 / 8
CPU lost = 1 / 8 confined frac = 0.8750
GPU lost = 1 / 8 confined frac = 0.8750
lost<->confined flips = 0 / 8
Why it is flaky
The test passes only on loss-step mismatches = 0 (PASS_REGULAR_EXPRESSION in test/tests/CMakeLists.txt). The CPU reference (procedure-pointer dispatch, OpenMP) and the GPU kernel (separately compiled gpu_timestep_euler) are numerically different code paths; their trajectories diverge chaotically over the 51-macrostep trace (final-state difference is O(1) even on passing runs, e.g. 2.76 locally). For a particle that is marginal against the s=1 loss boundary, the macrostep at which it crosses is then effectively a coin flip that depends on the runner's FP environment. In the failing run above both paths lose the same particle (flips = 0) but at different macrosteps.
Suggested fix
Keep the strong checks where they are well-posed and drop the ill-posed one:
- short-horizon equivalence: after a few microsteps, require max |z_cpu - z_gpu| below a tight tolerance (catches genuine kernel bugs before chaos amplifies),
- long-horizon statistics: require
lost<->confined flips = 0 (classification agreement) instead of exact loss-step equality.
Until then the test will fail sporadically on any marginal particle.
Observation
The CI run for commit abf6ca9 passed Build and Test at 2026-06-10T13:30 (run 27279802255) and failed at 2026-06-10T15:00 (run 27285400079) with no change to code or dependencies. The failing test was test_gpu_orbit_bench:
Why it is flaky
The test passes only on
loss-step mismatches = 0(PASS_REGULAR_EXPRESSION in test/tests/CMakeLists.txt). The CPU reference (procedure-pointer dispatch, OpenMP) and the GPU kernel (separately compiledgpu_timestep_euler) are numerically different code paths; their trajectories diverge chaotically over the 51-macrostep trace (final-state difference is O(1) even on passing runs, e.g. 2.76 locally). For a particle that is marginal against the s=1 loss boundary, the macrostep at which it crosses is then effectively a coin flip that depends on the runner's FP environment. In the failing run above both paths lose the same particle (flips = 0) but at different macrosteps.Suggested fix
Keep the strong checks where they are well-posed and drop the ill-posed one:
lost<->confined flips = 0(classification agreement) instead of exact loss-step equality.Until then the test will fail sporadically on any marginal particle.