Skip to content

Extract shared Gauss implicit-RK/Newton core for GC, CPP, and full orbit#407

Closed
krystophny wants to merge 3 commits into
mainfrom
refactor/shared-symplectic-core
Closed

Extract shared Gauss implicit-RK/Newton core for GC, CPP, and full orbit#407
krystophny wants to merge 3 commits into
mainfrom
refactor/shared-symplectic-core

Conversation

@krystophny

Copy link
Copy Markdown
Member

What

Extract the shared implicit Gauss-collocation residual / Jacobian / Newton / linear-solve core now used by the guiding-center (GC) integrator, the Classical Pauli Particle (CPP) pusher, and the curvilinear full-orbit pusher into a new device-callable module orbit_rk_core.

This is the third step of the full-orbit stack and the genuine second/third occurrence: GC (orbit_symplectic) and CPP (orbit_cpp) carried byte-identical degenerate-Lagrangian Gauss residual and Jacobian bodies on field_can_t (the CPP guiding-center motion is the slow manifold of the Pauli particle, so the residuals coincide at fixed mu); CPP and full orbit each carried their own dense LU solver.

How

New orbit_rk_core, all routines !$acc routine seq, fixed-size args, integer model dispatch (MODEL_GC, MODEL_CPP, MODEL_FO_SYMPL), no procedure pointers / class() in the hot path:

  • rk_gauss_residual / rk_gauss_jacobian -- select case(model) to one inlinable kernel for the field-canonical Gauss step (arithmetic lifted verbatim from f_rk_gauss / jac_rk_gauss).
  • rk_solve(n, A, b, info) -- fixed-size partial-pivot LU, replaces dgesv on the device and the duplicated lu_solve / lu_solve6.
  • rk_gauss_newton -- the newton_rk_gauss control flow with rk_solve and a returned hit_maxit flag, so the EVT_RK_GAUSS_MAXIT counter stays a CPU-only concern (mirrors how the GPU newton1 path omits fort.6601).

orbit_symplectic's f_rk_gauss / jac_rk_gauss become thin MODEL_GC wrappers; the GC CPU Newton keeps dgesv so its results do not change. orbit_cpp drops its residual/Jacobian/LU/Newton copies and calls the core with MODEL_CPP. orbit_full's foimpl_step calls rk_solve. coeff_rk_gauss (plain data) stays in orbit_symplectic_base.

Net -40 lines across existing modules; the only addition is the new shared module.

Hard constraints honored

  • src/simple_gpu.f90 and src/orbit_symplectic_euler1.f90 untouched: the production GPU path is symplectic-Euler + Boozer (EXPL_IMPL_EULER), not the Gauss path.
  • coeff_rk_gauss data and field_can_t layout unchanged.
  • GC orbit_timestep_sympl_rk_gauss signature and numerics unchanged.

Verification

Regression gate: GC golden / symplectic tests before and after, plus the GPU bench.

The standard Fast profile uses -ffast-math -ffp-contract=fast, which reassociates floating-point freely and differently across a module boundary, so moving identical arithmetic into a new compilation unit shifts the last bit. Measured drift is round-off only: gauss4 1 ULP (4.4e-16), gauss2 2.3e-13 accumulated over 10000 steps. To prove the extraction is arithmetically exact (same operations, same order), the byte-identity check is run under the repo's deterministic-FP profile (-DSIMPLE_DETERMINISTIC_FP=ON, IEEE strict), where a pure code-motion refactor must be byte-identical.

Deterministic-FP make build-deterministic CONFIG=Fast, test_sympl_stell:

# BEFORE (base feature/full-orbit-cpp-symplectic)
e3699b112bbc84908f68e2aa047fb6fb  gauss2.out
60b6786bc1ff40cda318a8337c2f1c03  gauss4.out
a72edc0dda240a271cca898657c072de  gauss6.out

# AFTER (refactor/shared-symplectic-core)
e3699b112bbc84908f68e2aa047fb6fb  gauss2.out
60b6786bc1ff40cda318a8337c2f1c03  gauss4.out
a72edc0dda240a271cca898657c072de  gauss6.out

$ diff before.md5 after.md5   # (empty) -> BYTE-IDENTICAL

Full suite, standard Fast config, after refactor:

$ ctest --test-dir build -R 'sympl|cpp|fo_symplectic|gpu_orbit|orbit_model|full_orbit'
100% tests passed, 0 tests failed out of 11
  test_sympl_tokamak ........... Passed
  test_sympl_stell ............. Passed
  test_fo_symplectic ........... Passed
  test_orbit_model_dispatch .... Passed
  test_cpp_equals_gc_largestep . Passed
  test_cpp_invariants .......... Passed
  test_gpu_orbit_bench ......... Passed   (loss-step mismatches = 0)

$ ctest --test-dir build -LE regression -E 'test_sympl_stell|golden'
100% tests passed, 0 tests failed out of 65

test_gpu_orbit_bench (the OpenACC EXPL_IMPL_EULER path) reports zero loss-step mismatches before and after, confirming the GPU production path is undisturbed.

Introduce a gyro-resolved Lorentz (Boris) full-orbit integrator alongside
the symplectic guiding-center tracer, with a provider seam that carries no
libneo symbol on the analytic path.

- orbit_full_provider: abstract field_metric_provider_t (eval_field, metric,
  christoffel, eval_canfield, eval_potential) plus shared FO_* error codes.
- orbit_full_mock_cart: Cartesian provider, flat metric, zero Christoffel,
  uniform / linear-gradient / coils B; canonical-field seam stubbed.
- orbit_full_mock_cyl: cylindrical (R,phi,Z) provider with analytic metric,
  closed-form Christoffel, and axisymmetric 1/R toroidal B.
- orbit_full: FullOrbitState, generic init (provider + coils convenience),
  Boris drift-kick-drift (Cartesian rotation; cylindrical adds the
  -Gamma^i_mn v^m v^n geodesic term), convert_full_to_gc, compute_energy.
  ORBIT_PAULI declared and dispatched to a non-fatal not-implemented channel.
- params: orbit_model selector in the config namelist.
- test_full_orbit: behavioral oracles on the analytic mocks only (uniform-B
  gyration, Cartesian grad-B drift, cylindrical curvature and grad-B drift,
  mu invariance, cylindrical Christoffel vs closed form and FD).
Implement two structure-preserving pushers alongside the GC tracer, both
GPU-portable (integer model dispatch, fixed-size state, no procedure
pointers or class() in the per-step loop):

- orbit_cpp: Classical Pauli Particle in the flux-canonical realization
  (Xiao & Qin 2021). Reuses symplectic_integrator_t z(4), field_can_t mu/ro0
  and get_derivatives2 verbatim; the discrete scheme is the degenerate-
  Lagrangian Euler-Lagrange (Gauss collocation) with mu held fixed, i.e. the
  GC slow-manifold equations. Own residual/Jacobian/Newton with a device LU
  solve (no dgesv on device), GAUSS1..4 by stage count.
- orbit_full foimpl_step: implicit-midpoint curvilinear full orbit (B1,
  VENUS-LEVIS geometry), 6D Lorentz + geodesic term on the provider seam,
  Newton with FD Jacobian and a 6x6 LU solve. New model code ORBIT_FOSYMPL.

Wire simple_main macrostep to dispatch on params%orbit_model (GC default,
PAULI -> orbit_cpp); the GPU batch path is unaffected (it stays restricted to
BOOZER + EXPL_IMPL_EULER). Drop the non-reentrant init_full_orbit_state_coils
(module save provider + hardcoded R_AXIS), removing the forced neo_biotsavart
dependency from the generic init interface; all callers use the _prov variant.

Tests (analytic mocks + cheap BOOZER field on the QA wout):
- test_cpp_equals_gc_largestep: CPP reproduces GC to Newton tolerance at
  GC-sized steps (max |z_CPP - z_GC| = 0 at GAUSS1/GAUSS2).
- test_cpp_invariants: mu identically conserved; energy oscillation, secular
  drift, and p_phi excursion match GC bit-for-bit.
- test_fo_symplectic: |v|/energy conserved (uniform B), curvature drift,
  no secular energy drift over a long run.
- test_orbit_model_dispatch: orbit_model parsed from the config namelist and
  mapped to the right pusher/stage count.

Existing GC tests (test_sympl_*, test_axis_crossing, test_gpu_orbit_bench,
test_full_orbit) unchanged and green.
GC (orbit_symplectic) and CPP (orbit_cpp) carried byte-identical
degenerate-Lagrangian Gauss residual and Jacobian bodies on field_can_t;
CPP and full orbit (orbit_full) each carried their own dense LU solver.
The CPP guiding-center motion is the slow manifold of the Pauli particle,
so its Gauss residual coincides with the GC one at fixed mu. That is the
genuine second/third occurrence to extract.

New module orbit_rk_core holds the device-callable shared core:
  - rk_gauss_residual / rk_gauss_jacobian, integer model dispatch
    (MODEL_GC, MODEL_CPP) by select case to inlinable !$acc routine seq
    kernels (no procedure pointers, no class() in the hot path)
  - rk_solve, a fixed-size partial-pivot LU solve replacing dgesv on the
    device and the duplicated lu_solve / lu_solve6
  - rk_gauss_newton, the newton_rk_gauss control flow with rk_solve and a
    returned hit_maxit flag so the event counter stays a CPU-only concern

orbit_symplectic f_rk_gauss/jac_rk_gauss become thin MODEL_GC wrappers;
the GC CPU Newton keeps dgesv so its results are unchanged. orbit_cpp
drops its residual/Jacobian/LU/Newton copies and calls the core with
MODEL_CPP. orbit_full's foimpl_step uses rk_solve.

coeff_rk_gauss (plain data) stays in orbit_symplectic_base and is called
inside the kernels. The OpenACC symplectic-Euler GPU kernel
(simple_gpu.f90, orbit_symplectic_euler1.f90) is on the Boozer
EXPL_IMPL_EULER path, not the Gauss path, and is untouched.

Pure refactor: same operations, same order. Under deterministic IEEE FP
the GC Gauss golden outputs (gauss2/4/6) are byte-identical before and
after; the full ctest suite stays green and test_gpu_orbit_bench reports
zero loss-step mismatches.
@krystophny krystophny added tier/T1 pure refactor size/L review size up to 1000 changed lines labels Jun 19, 2026
@krystophny krystophny deleted the branch main June 23, 2026 18:58
@krystophny krystophny closed this Jun 23, 2026
@krystophny krystophny reopened this Jun 23, 2026
@krystophny krystophny changed the base branch from feature/full-orbit-cpp-symplectic to main June 23, 2026 19:00
krystophny added a commit that referenced this pull request Jun 25, 2026
…d symplectic FO (#429)

## Summary

Rebase-and-strip of the shelved shared-symplectic-core work (PR #407)
onto
current `main`. Keeps only the keep-worthy nucleus: the guiding-centre
(GC)
Gauss implicit-RK / Newton core, extracted into a new device-callable
module
`orbit_rk_core`. Everything CPP and symplectic-full-orbit is dropped.

### What lands
- New `src/orbit_rk_core.f90`: the GC Gauss-collocation residual,
analytic
Jacobian, a device-portable dense LU solve (`rk_solve`), and a
device-callable
Newton shell (`rk_gauss_newton`). Reduced to a single GC kernel; the
former
`MODEL_CPP` / `MODEL_FO_SYMPL` codes and their dispatch shells are gone.
- `src/orbit_symplectic.f90`: `f_rk_gauss` / `jac_rk_gauss` become thin
wrappers
over the shared core (`MODEL_GC`). The arithmetic is moved
byte-identically,
  so GC numerics do not change.
- `src/CMakeLists.txt`: compile `orbit_rk_core.f90` before
`orbit_symplectic.f90`.

### What is dropped (not merged)
- The CPP pusher (`orbit_cpp.f90`, `MODEL_CPP`) and its tests
  (`test_cpp_equals_gc_largestep`, `test_cpp_invariants`).
- The shelved symplectic / mocked-Boris full-orbit prototype
(`orbit_full.f90`,
`MODEL_FO_SYMPL`), its provider seam (`orbit_full_provider.f90`) and the
two
  mocks, and their tests (`test_fo_symplectic`, `test_full_orbit`,
  `test_orbit_model_dispatch`).
- The branch's `params.f90` / `simple_main.f90` `orbit_model` hunks:
`main`
already owns `orbit_model` (`ORBIT_GC=0`, `ORBIT_FULL_ORBIT=7`) and the
validated explicit Boris full-orbit pusher (PR #426), which this PR
leaves
  untouched.

## Why

`orbit_rk_core` is the single home for the GC Gauss step shared by the
CPU
integrator and (later) the OpenACC GPU kernel; it is written `!$acc
routine
seq`-ready with its own device-portable LU. CPP is retired; the
full-orbit path
is the explicit Cartesian Boris (`orbit_model=7`), not the shelved
symplectic
prototype.

## Verification

`make CONFIG=Fast`: build OK. `make test`: **100% tests passed, 0 failed
out of
77** (total 677.80 s). All 14 golden-record regression cases (albert,
albert_coils, boozer, boozer_ncsx, canonical, classifier,
classifier_combined,
classifier_fast, collision, meiss, meiss_coils, and the tokamak
variants)
pass, confirming the GC numerics are byte-identical after the
extraction.

Before (on `main`): the same 14 golden-record cases pass with the
inlined GC
residual/Jacobian. After (this branch): identical results with the
arithmetic
moved into `orbit_rk_core` -- a pure refactor, no numeric change.

## Follow-ups (separate PRs)

- Wire `rk_gauss_newton` / `rk_solve` onto the fortnum multiroot core in
place of
  `dgesv` for the CPU GC Newton; must preserve the verified GC numerics.
- The device (OpenACC) GC solver path uses fortnum's
reverse-communication
  `multiroot_step` (fortnum PR for the device-capable solver), keeping
  `orbit_rk_core` as the shared CPU+GPU home.
@krystophny

Copy link
Copy Markdown
Member Author

Superseded by #429, which landed the keep-worthy nucleus (the GC Gauss/Newton core in orbit_rk_core) rebased onto current main. The CPP and symplectic-FO parts this PR also carried are retired: CPP per #420, and the full-orbit path is the explicit Cartesian Boris (orbit_model=7, #426), not the shelved symplectic prototype.

@krystophny krystophny closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L review size up to 1000 changed lines tier/T1 pure refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant