Extract shared Gauss implicit-RK/Newton core for GC, CPP, and full orbit#407
Closed
krystophny wants to merge 3 commits into
Closed
Extract shared Gauss implicit-RK/Newton core for GC, CPP, and full orbit#407krystophny wants to merge 3 commits into
krystophny wants to merge 3 commits into
Conversation
Introduce a gyro-resolved Lorentz (Boris) full-orbit integrator alongside the symplectic guiding-center tracer, with a provider seam that carries no libneo symbol on the analytic path. - orbit_full_provider: abstract field_metric_provider_t (eval_field, metric, christoffel, eval_canfield, eval_potential) plus shared FO_* error codes. - orbit_full_mock_cart: Cartesian provider, flat metric, zero Christoffel, uniform / linear-gradient / coils B; canonical-field seam stubbed. - orbit_full_mock_cyl: cylindrical (R,phi,Z) provider with analytic metric, closed-form Christoffel, and axisymmetric 1/R toroidal B. - orbit_full: FullOrbitState, generic init (provider + coils convenience), Boris drift-kick-drift (Cartesian rotation; cylindrical adds the -Gamma^i_mn v^m v^n geodesic term), convert_full_to_gc, compute_energy. ORBIT_PAULI declared and dispatched to a non-fatal not-implemented channel. - params: orbit_model selector in the config namelist. - test_full_orbit: behavioral oracles on the analytic mocks only (uniform-B gyration, Cartesian grad-B drift, cylindrical curvature and grad-B drift, mu invariance, cylindrical Christoffel vs closed form and FD).
Implement two structure-preserving pushers alongside the GC tracer, both GPU-portable (integer model dispatch, fixed-size state, no procedure pointers or class() in the per-step loop): - orbit_cpp: Classical Pauli Particle in the flux-canonical realization (Xiao & Qin 2021). Reuses symplectic_integrator_t z(4), field_can_t mu/ro0 and get_derivatives2 verbatim; the discrete scheme is the degenerate- Lagrangian Euler-Lagrange (Gauss collocation) with mu held fixed, i.e. the GC slow-manifold equations. Own residual/Jacobian/Newton with a device LU solve (no dgesv on device), GAUSS1..4 by stage count. - orbit_full foimpl_step: implicit-midpoint curvilinear full orbit (B1, VENUS-LEVIS geometry), 6D Lorentz + geodesic term on the provider seam, Newton with FD Jacobian and a 6x6 LU solve. New model code ORBIT_FOSYMPL. Wire simple_main macrostep to dispatch on params%orbit_model (GC default, PAULI -> orbit_cpp); the GPU batch path is unaffected (it stays restricted to BOOZER + EXPL_IMPL_EULER). Drop the non-reentrant init_full_orbit_state_coils (module save provider + hardcoded R_AXIS), removing the forced neo_biotsavart dependency from the generic init interface; all callers use the _prov variant. Tests (analytic mocks + cheap BOOZER field on the QA wout): - test_cpp_equals_gc_largestep: CPP reproduces GC to Newton tolerance at GC-sized steps (max |z_CPP - z_GC| = 0 at GAUSS1/GAUSS2). - test_cpp_invariants: mu identically conserved; energy oscillation, secular drift, and p_phi excursion match GC bit-for-bit. - test_fo_symplectic: |v|/energy conserved (uniform B), curvature drift, no secular energy drift over a long run. - test_orbit_model_dispatch: orbit_model parsed from the config namelist and mapped to the right pusher/stage count. Existing GC tests (test_sympl_*, test_axis_crossing, test_gpu_orbit_bench, test_full_orbit) unchanged and green.
GC (orbit_symplectic) and CPP (orbit_cpp) carried byte-identical
degenerate-Lagrangian Gauss residual and Jacobian bodies on field_can_t;
CPP and full orbit (orbit_full) each carried their own dense LU solver.
The CPP guiding-center motion is the slow manifold of the Pauli particle,
so its Gauss residual coincides with the GC one at fixed mu. That is the
genuine second/third occurrence to extract.
New module orbit_rk_core holds the device-callable shared core:
- rk_gauss_residual / rk_gauss_jacobian, integer model dispatch
(MODEL_GC, MODEL_CPP) by select case to inlinable !$acc routine seq
kernels (no procedure pointers, no class() in the hot path)
- rk_solve, a fixed-size partial-pivot LU solve replacing dgesv on the
device and the duplicated lu_solve / lu_solve6
- rk_gauss_newton, the newton_rk_gauss control flow with rk_solve and a
returned hit_maxit flag so the event counter stays a CPU-only concern
orbit_symplectic f_rk_gauss/jac_rk_gauss become thin MODEL_GC wrappers;
the GC CPU Newton keeps dgesv so its results are unchanged. orbit_cpp
drops its residual/Jacobian/LU/Newton copies and calls the core with
MODEL_CPP. orbit_full's foimpl_step uses rk_solve.
coeff_rk_gauss (plain data) stays in orbit_symplectic_base and is called
inside the kernels. The OpenACC symplectic-Euler GPU kernel
(simple_gpu.f90, orbit_symplectic_euler1.f90) is on the Boozer
EXPL_IMPL_EULER path, not the Gauss path, and is untouched.
Pure refactor: same operations, same order. Under deterministic IEEE FP
the GC Gauss golden outputs (gauss2/4/6) are byte-identical before and
after; the full ctest suite stays green and test_gpu_orbit_bench reports
zero loss-step mismatches.
krystophny
added a commit
that referenced
this pull request
Jun 25, 2026
…d symplectic FO (#429) ## Summary Rebase-and-strip of the shelved shared-symplectic-core work (PR #407) onto current `main`. Keeps only the keep-worthy nucleus: the guiding-centre (GC) Gauss implicit-RK / Newton core, extracted into a new device-callable module `orbit_rk_core`. Everything CPP and symplectic-full-orbit is dropped. ### What lands - New `src/orbit_rk_core.f90`: the GC Gauss-collocation residual, analytic Jacobian, a device-portable dense LU solve (`rk_solve`), and a device-callable Newton shell (`rk_gauss_newton`). Reduced to a single GC kernel; the former `MODEL_CPP` / `MODEL_FO_SYMPL` codes and their dispatch shells are gone. - `src/orbit_symplectic.f90`: `f_rk_gauss` / `jac_rk_gauss` become thin wrappers over the shared core (`MODEL_GC`). The arithmetic is moved byte-identically, so GC numerics do not change. - `src/CMakeLists.txt`: compile `orbit_rk_core.f90` before `orbit_symplectic.f90`. ### What is dropped (not merged) - The CPP pusher (`orbit_cpp.f90`, `MODEL_CPP`) and its tests (`test_cpp_equals_gc_largestep`, `test_cpp_invariants`). - The shelved symplectic / mocked-Boris full-orbit prototype (`orbit_full.f90`, `MODEL_FO_SYMPL`), its provider seam (`orbit_full_provider.f90`) and the two mocks, and their tests (`test_fo_symplectic`, `test_full_orbit`, `test_orbit_model_dispatch`). - The branch's `params.f90` / `simple_main.f90` `orbit_model` hunks: `main` already owns `orbit_model` (`ORBIT_GC=0`, `ORBIT_FULL_ORBIT=7`) and the validated explicit Boris full-orbit pusher (PR #426), which this PR leaves untouched. ## Why `orbit_rk_core` is the single home for the GC Gauss step shared by the CPU integrator and (later) the OpenACC GPU kernel; it is written `!$acc routine seq`-ready with its own device-portable LU. CPP is retired; the full-orbit path is the explicit Cartesian Boris (`orbit_model=7`), not the shelved symplectic prototype. ## Verification `make CONFIG=Fast`: build OK. `make test`: **100% tests passed, 0 failed out of 77** (total 677.80 s). All 14 golden-record regression cases (albert, albert_coils, boozer, boozer_ncsx, canonical, classifier, classifier_combined, classifier_fast, collision, meiss, meiss_coils, and the tokamak variants) pass, confirming the GC numerics are byte-identical after the extraction. Before (on `main`): the same 14 golden-record cases pass with the inlined GC residual/Jacobian. After (this branch): identical results with the arithmetic moved into `orbit_rk_core` -- a pure refactor, no numeric change. ## Follow-ups (separate PRs) - Wire `rk_gauss_newton` / `rk_solve` onto the fortnum multiroot core in place of `dgesv` for the CPU GC Newton; must preserve the verified GC numerics. - The device (OpenACC) GC solver path uses fortnum's reverse-communication `multiroot_step` (fortnum PR for the device-capable solver), keeping `orbit_rk_core` as the shared CPU+GPU home.
Member
Author
|
Superseded by #429, which landed the keep-worthy nucleus (the GC Gauss/Newton core in orbit_rk_core) rebased onto current main. The CPP and symplectic-FO parts this PR also carried are retired: CPP per #420, and the full-orbit path is the explicit Cartesian Boris (orbit_model=7, #426), not the shelved symplectic prototype. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Extract the shared implicit Gauss-collocation residual / Jacobian / Newton / linear-solve core now used by the guiding-center (GC) integrator, the Classical Pauli Particle (CPP) pusher, and the curvilinear full-orbit pusher into a new device-callable module
orbit_rk_core.This is the third step of the full-orbit stack and the genuine second/third occurrence: GC (
orbit_symplectic) and CPP (orbit_cpp) carried byte-identical degenerate-Lagrangian Gauss residual and Jacobian bodies onfield_can_t(the CPP guiding-center motion is the slow manifold of the Pauli particle, so the residuals coincide at fixed mu); CPP and full orbit each carried their own dense LU solver.How
New
orbit_rk_core, all routines!$acc routine seq, fixed-size args, integer model dispatch (MODEL_GC,MODEL_CPP,MODEL_FO_SYMPL), no procedure pointers /class()in the hot path:rk_gauss_residual/rk_gauss_jacobian--select case(model)to one inlinable kernel for the field-canonical Gauss step (arithmetic lifted verbatim fromf_rk_gauss/jac_rk_gauss).rk_solve(n, A, b, info)-- fixed-size partial-pivot LU, replacesdgesvon the device and the duplicatedlu_solve/lu_solve6.rk_gauss_newton-- thenewton_rk_gausscontrol flow withrk_solveand a returnedhit_maxitflag, so theEVT_RK_GAUSS_MAXITcounter stays a CPU-only concern (mirrors how the GPU newton1 path omits fort.6601).orbit_symplectic'sf_rk_gauss/jac_rk_gaussbecome thinMODEL_GCwrappers; the GC CPU Newton keepsdgesvso its results do not change.orbit_cppdrops its residual/Jacobian/LU/Newton copies and calls the core withMODEL_CPP.orbit_full'sfoimpl_stepcallsrk_solve.coeff_rk_gauss(plain data) stays inorbit_symplectic_base.Net -40 lines across existing modules; the only addition is the new shared module.
Hard constraints honored
src/simple_gpu.f90andsrc/orbit_symplectic_euler1.f90untouched: the production GPU path is symplectic-Euler + Boozer (EXPL_IMPL_EULER), not the Gauss path.coeff_rk_gaussdata andfield_can_tlayout unchanged.orbit_timestep_sympl_rk_gausssignature and numerics unchanged.Verification
Regression gate: GC golden / symplectic tests before and after, plus the GPU bench.
The standard
Fastprofile uses-ffast-math -ffp-contract=fast, which reassociates floating-point freely and differently across a module boundary, so moving identical arithmetic into a new compilation unit shifts the last bit. Measured drift is round-off only: gauss4 1 ULP (4.4e-16), gauss2 2.3e-13 accumulated over 10000 steps. To prove the extraction is arithmetically exact (same operations, same order), the byte-identity check is run under the repo's deterministic-FP profile (-DSIMPLE_DETERMINISTIC_FP=ON, IEEE strict), where a pure code-motion refactor must be byte-identical.Deterministic-FP
make build-deterministic CONFIG=Fast,test_sympl_stell:Full suite, standard
Fastconfig, after refactor:test_gpu_orbit_bench(the OpenACC EXPL_IMPL_EULER path) reports zero loss-step mismatches before and after, confirming the GPU production path is undisturbed.