Extract shared Gauss implicit-RK/Newton core for GC, CPP, and full orbit by krystophny · Pull Request #407 · itpplasma/SIMPLE

krystophny · 2026-06-19T15:18:48Z

What

Extract the shared implicit Gauss-collocation residual / Jacobian / Newton / linear-solve core now used by the guiding-center (GC) integrator, the Classical Pauli Particle (CPP) pusher, and the curvilinear full-orbit pusher into a new device-callable module orbit_rk_core.

This is the third step of the full-orbit stack and the genuine second/third occurrence: GC (orbit_symplectic) and CPP (orbit_cpp) carried byte-identical degenerate-Lagrangian Gauss residual and Jacobian bodies on field_can_t (the CPP guiding-center motion is the slow manifold of the Pauli particle, so the residuals coincide at fixed mu); CPP and full orbit each carried their own dense LU solver.

How

New orbit_rk_core, all routines !$acc routine seq, fixed-size args, integer model dispatch (MODEL_GC, MODEL_CPP, MODEL_FO_SYMPL), no procedure pointers / class() in the hot path:

rk_gauss_residual / rk_gauss_jacobian -- select case(model) to one inlinable kernel for the field-canonical Gauss step (arithmetic lifted verbatim from f_rk_gauss / jac_rk_gauss).
rk_solve(n, A, b, info) -- fixed-size partial-pivot LU, replaces dgesv on the device and the duplicated lu_solve / lu_solve6.
rk_gauss_newton -- the newton_rk_gauss control flow with rk_solve and a returned hit_maxit flag, so the EVT_RK_GAUSS_MAXIT counter stays a CPU-only concern (mirrors how the GPU newton1 path omits fort.6601).

orbit_symplectic's f_rk_gauss / jac_rk_gauss become thin MODEL_GC wrappers; the GC CPU Newton keeps dgesv so its results do not change. orbit_cpp drops its residual/Jacobian/LU/Newton copies and calls the core with MODEL_CPP. orbit_full's foimpl_step calls rk_solve. coeff_rk_gauss (plain data) stays in orbit_symplectic_base.

Net -40 lines across existing modules; the only addition is the new shared module.

Hard constraints honored

src/simple_gpu.f90 and src/orbit_symplectic_euler1.f90 untouched: the production GPU path is symplectic-Euler + Boozer (EXPL_IMPL_EULER), not the Gauss path.
coeff_rk_gauss data and field_can_t layout unchanged.
GC orbit_timestep_sympl_rk_gauss signature and numerics unchanged.

Verification

Regression gate: GC golden / symplectic tests before and after, plus the GPU bench.

The standard Fast profile uses -ffast-math -ffp-contract=fast, which reassociates floating-point freely and differently across a module boundary, so moving identical arithmetic into a new compilation unit shifts the last bit. Measured drift is round-off only: gauss4 1 ULP (4.4e-16), gauss2 2.3e-13 accumulated over 10000 steps. To prove the extraction is arithmetically exact (same operations, same order), the byte-identity check is run under the repo's deterministic-FP profile (-DSIMPLE_DETERMINISTIC_FP=ON, IEEE strict), where a pure code-motion refactor must be byte-identical.

Deterministic-FP make build-deterministic CONFIG=Fast, test_sympl_stell:

# BEFORE (base feature/full-orbit-cpp-symplectic)
e3699b112bbc84908f68e2aa047fb6fb  gauss2.out
60b6786bc1ff40cda318a8337c2f1c03  gauss4.out
a72edc0dda240a271cca898657c072de  gauss6.out

# AFTER (refactor/shared-symplectic-core)
e3699b112bbc84908f68e2aa047fb6fb  gauss2.out
60b6786bc1ff40cda318a8337c2f1c03  gauss4.out
a72edc0dda240a271cca898657c072de  gauss6.out

$ diff before.md5 after.md5   # (empty) -> BYTE-IDENTICAL

Full suite, standard Fast config, after refactor:

$ ctest --test-dir build -R 'sympl|cpp|fo_symplectic|gpu_orbit|orbit_model|full_orbit'
100% tests passed, 0 tests failed out of 11
  test_sympl_tokamak ........... Passed
  test_sympl_stell ............. Passed
  test_fo_symplectic ........... Passed
  test_orbit_model_dispatch .... Passed
  test_cpp_equals_gc_largestep . Passed
  test_cpp_invariants .......... Passed
  test_gpu_orbit_bench ......... Passed   (loss-step mismatches = 0)

$ ctest --test-dir build -LE regression -E 'test_sympl_stell|golden'
100% tests passed, 0 tests failed out of 65

test_gpu_orbit_bench (the OpenACC EXPL_IMPL_EULER path) reports zero loss-step mismatches before and after, confirming the GPU production path is undisturbed.

Introduce a gyro-resolved Lorentz (Boris) full-orbit integrator alongside the symplectic guiding-center tracer, with a provider seam that carries no libneo symbol on the analytic path. - orbit_full_provider: abstract field_metric_provider_t (eval_field, metric, christoffel, eval_canfield, eval_potential) plus shared FO_* error codes. - orbit_full_mock_cart: Cartesian provider, flat metric, zero Christoffel, uniform / linear-gradient / coils B; canonical-field seam stubbed. - orbit_full_mock_cyl: cylindrical (R,phi,Z) provider with analytic metric, closed-form Christoffel, and axisymmetric 1/R toroidal B. - orbit_full: FullOrbitState, generic init (provider + coils convenience), Boris drift-kick-drift (Cartesian rotation; cylindrical adds the -Gamma^i_mn v^m v^n geodesic term), convert_full_to_gc, compute_energy. ORBIT_PAULI declared and dispatched to a non-fatal not-implemented channel. - params: orbit_model selector in the config namelist. - test_full_orbit: behavioral oracles on the analytic mocks only (uniform-B gyration, Cartesian grad-B drift, cylindrical curvature and grad-B drift, mu invariance, cylindrical Christoffel vs closed form and FD).

Implement two structure-preserving pushers alongside the GC tracer, both GPU-portable (integer model dispatch, fixed-size state, no procedure pointers or class() in the per-step loop): - orbit_cpp: Classical Pauli Particle in the flux-canonical realization (Xiao & Qin 2021). Reuses symplectic_integrator_t z(4), field_can_t mu/ro0 and get_derivatives2 verbatim; the discrete scheme is the degenerate- Lagrangian Euler-Lagrange (Gauss collocation) with mu held fixed, i.e. the GC slow-manifold equations. Own residual/Jacobian/Newton with a device LU solve (no dgesv on device), GAUSS1..4 by stage count. - orbit_full foimpl_step: implicit-midpoint curvilinear full orbit (B1, VENUS-LEVIS geometry), 6D Lorentz + geodesic term on the provider seam, Newton with FD Jacobian and a 6x6 LU solve. New model code ORBIT_FOSYMPL. Wire simple_main macrostep to dispatch on params%orbit_model (GC default, PAULI -> orbit_cpp); the GPU batch path is unaffected (it stays restricted to BOOZER + EXPL_IMPL_EULER). Drop the non-reentrant init_full_orbit_state_coils (module save provider + hardcoded R_AXIS), removing the forced neo_biotsavart dependency from the generic init interface; all callers use the _prov variant. Tests (analytic mocks + cheap BOOZER field on the QA wout): - test_cpp_equals_gc_largestep: CPP reproduces GC to Newton tolerance at GC-sized steps (max |z_CPP - z_GC| = 0 at GAUSS1/GAUSS2). - test_cpp_invariants: mu identically conserved; energy oscillation, secular drift, and p_phi excursion match GC bit-for-bit. - test_fo_symplectic: |v|/energy conserved (uniform B), curvature drift, no secular energy drift over a long run. - test_orbit_model_dispatch: orbit_model parsed from the config namelist and mapped to the right pusher/stage count. Existing GC tests (test_sympl_*, test_axis_crossing, test_gpu_orbit_bench, test_full_orbit) unchanged and green.

GC (orbit_symplectic) and CPP (orbit_cpp) carried byte-identical degenerate-Lagrangian Gauss residual and Jacobian bodies on field_can_t; CPP and full orbit (orbit_full) each carried their own dense LU solver. The CPP guiding-center motion is the slow manifold of the Pauli particle, so its Gauss residual coincides with the GC one at fixed mu. That is the genuine second/third occurrence to extract. New module orbit_rk_core holds the device-callable shared core: - rk_gauss_residual / rk_gauss_jacobian, integer model dispatch (MODEL_GC, MODEL_CPP) by select case to inlinable !$acc routine seq kernels (no procedure pointers, no class() in the hot path) - rk_solve, a fixed-size partial-pivot LU solve replacing dgesv on the device and the duplicated lu_solve / lu_solve6 - rk_gauss_newton, the newton_rk_gauss control flow with rk_solve and a returned hit_maxit flag so the event counter stays a CPU-only concern orbit_symplectic f_rk_gauss/jac_rk_gauss become thin MODEL_GC wrappers; the GC CPU Newton keeps dgesv so its results are unchanged. orbit_cpp drops its residual/Jacobian/LU/Newton copies and calls the core with MODEL_CPP. orbit_full's foimpl_step uses rk_solve. coeff_rk_gauss (plain data) stays in orbit_symplectic_base and is called inside the kernels. The OpenACC symplectic-Euler GPU kernel (simple_gpu.f90, orbit_symplectic_euler1.f90) is on the Boozer EXPL_IMPL_EULER path, not the Gauss path, and is untouched. Pure refactor: same operations, same order. Under deterministic IEEE FP the GC Gauss golden outputs (gauss2/4/6) are byte-identical before and after; the full ctest suite stays green and test_gpu_orbit_bench reports zero loss-step mismatches.

…d symplectic FO (#429) ## Summary Rebase-and-strip of the shelved shared-symplectic-core work (PR #407) onto current `main`. Keeps only the keep-worthy nucleus: the guiding-centre (GC) Gauss implicit-RK / Newton core, extracted into a new device-callable module `orbit_rk_core`. Everything CPP and symplectic-full-orbit is dropped. ### What lands - New `src/orbit_rk_core.f90`: the GC Gauss-collocation residual, analytic Jacobian, a device-portable dense LU solve (`rk_solve`), and a device-callable Newton shell (`rk_gauss_newton`). Reduced to a single GC kernel; the former `MODEL_CPP` / `MODEL_FO_SYMPL` codes and their dispatch shells are gone. - `src/orbit_symplectic.f90`: `f_rk_gauss` / `jac_rk_gauss` become thin wrappers over the shared core (`MODEL_GC`). The arithmetic is moved byte-identically, so GC numerics do not change. - `src/CMakeLists.txt`: compile `orbit_rk_core.f90` before `orbit_symplectic.f90`. ### What is dropped (not merged) - The CPP pusher (`orbit_cpp.f90`, `MODEL_CPP`) and its tests (`test_cpp_equals_gc_largestep`, `test_cpp_invariants`). - The shelved symplectic / mocked-Boris full-orbit prototype (`orbit_full.f90`, `MODEL_FO_SYMPL`), its provider seam (`orbit_full_provider.f90`) and the two mocks, and their tests (`test_fo_symplectic`, `test_full_orbit`, `test_orbit_model_dispatch`). - The branch's `params.f90` / `simple_main.f90` `orbit_model` hunks: `main` already owns `orbit_model` (`ORBIT_GC=0`, `ORBIT_FULL_ORBIT=7`) and the validated explicit Boris full-orbit pusher (PR #426), which this PR leaves untouched. ## Why `orbit_rk_core` is the single home for the GC Gauss step shared by the CPU integrator and (later) the OpenACC GPU kernel; it is written `!$acc routine seq`-ready with its own device-portable LU. CPP is retired; the full-orbit path is the explicit Cartesian Boris (`orbit_model=7`), not the shelved symplectic prototype. ## Verification `make CONFIG=Fast`: build OK. `make test`: **100% tests passed, 0 failed out of 77** (total 677.80 s). All 14 golden-record regression cases (albert, albert_coils, boozer, boozer_ncsx, canonical, classifier, classifier_combined, classifier_fast, collision, meiss, meiss_coils, and the tokamak variants) pass, confirming the GC numerics are byte-identical after the extraction. Before (on `main`): the same 14 golden-record cases pass with the inlined GC residual/Jacobian. After (this branch): identical results with the arithmetic moved into `orbit_rk_core` -- a pure refactor, no numeric change. ## Follow-ups (separate PRs) - Wire `rk_gauss_newton` / `rk_solve` onto the fortnum multiroot core in place of `dgesv` for the CPU GC Newton; must preserve the verified GC numerics. - The device (OpenACC) GC solver path uses fortnum's reverse-communication `multiroot_step` (fortnum PR for the device-capable solver), keeping `orbit_rk_core` as the shared CPU+GPU home.

krystophny · 2026-06-25T08:46:58Z

Superseded by #429, which landed the keep-worthy nucleus (the GC Gauss/Newton core in orbit_rk_core) rebased onto current main. The CPP and symplectic-FO parts this PR also carried are retired: CPP per #420, and the full-orbit path is the explicit Cartesian Boris (orbit_model=7, #426), not the shelved symplectic prototype.

krystophny added 3 commits June 19, 2026 15:59

krystophny added tier/T1 pure refactor size/L review size up to 1000 changed lines labels Jun 19, 2026

krystophny temporarily deployed to github-pages June 19, 2026 15:19 — with GitHub Actions Inactive

krystophny deleted the branch main June 23, 2026 18:58

krystophny closed this Jun 23, 2026

krystophny reopened this Jun 23, 2026

krystophny changed the base branch from feature/full-orbit-cpp-symplectic to main June 23, 2026 19:00

krystophny temporarily deployed to github-pages June 23, 2026 19:00 — with GitHub Actions Inactive

krystophny mentioned this pull request Jun 24, 2026

Extract shared GC Gauss implicit-RK core (orbit_rk_core); drop CPP and symplectic FO #429

Merged

krystophny closed this Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extract shared Gauss implicit-RK/Newton core for GC, CPP, and full orbit#407

Extract shared Gauss implicit-RK/Newton core for GC, CPP, and full orbit#407
krystophny wants to merge 3 commits into
mainfrom
refactor/shared-symplectic-core

krystophny commented Jun 19, 2026

Uh oh!

krystophny commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

krystophny commented Jun 19, 2026

What

How

Hard constraints honored

Verification

Uh oh!

krystophny commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant