ideal_mhd_model: share the constraint-force kernels by krystophny · Pull Request #575 · proximafusion/vmecpp

krystophny · 2026-06-14T14:34:53Z

Stacked PR — part 12/19 of the differentiable-VMEC++ series. merge after #574 (lambda-force-kernel).
Diff is cumulative (includes ancestor commits) because the branches are stacked on the fork; review the net change described below.

What

Extract the two local (non-transform) pieces of VMEC's spectral-condensation
constraint force into constraint_force_kernel.h, shared between the solver and
the Enzyme autodiff path:

ComputeEffectiveConstraintForce: gConEff = (rCon-rCon0)*ru + (zCon-zCon0)*zu
(effectiveConstraintForce), skipping the axis surface (no poloidal angle).
AddConstraintForces: add the bandpass-filtered gCon back into the MHD R/Z
forces (brmn, bzmn) and write the constraint outputs (frcon, fzcon)
(the constraint part of assembleTotalForces).

The Fourier-space bandpass between them stays the existing shared free function
deAliasConstraintForce; the conditional free-boundary rBSq contribution
stays in assembleTotalForces. Both kernels are allocation-free over flat
buffers.

With the lambda force (#20) and the six MHD kernels (#13-#18), this completes
the local force-density terms of the augmented functional. The nonlinear core
of the exact Hessian is J = J_mhd + J_lambda + J_con; the exact HVP is then
Tᵀ·J·T with the linear spectral transforms.

Verification

Bit-exact against the pre-refactor solver on both cases at 1 and 4 threads
(4 threads exercises the ghost-cell radial partitioning):

solovev            nt=1 : MHD Energy = 2.548352e+00
solovev            nt=4 : MHD Energy = 2.548352e+00
cth_like_fixed_bdy nt=1 : MHD Energy = 5.057191e-02
cth_like_fixed_bdy nt=4 : MHD Energy = 5.057191e-02

Build clean with clang-21 (0 errors).

Stacked on #20.

The CMake FetchContent abseil pin (2024-08) fails to compile under Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the numbers.cc nullability annotations are rejected by the newer frontend. Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8 and GCC. Clang is the compiler required for the Enzyme autodiff build. The Bazel build keeps its own (BCR) abseil pin and is unaffected.

Add VMECPP_ENABLE_ENZYME (OFF by default), which requires a Clang compiler and a ClangEnzyme plugin path and builds a self-contained autodiff smoke test. The test differentiates a scalar objective written over Eigen::Map'd caller buffers and checks reverse- and forward-mode Enzyme gradients against the closed form and central finite differences. enzyme.h documents the intrinsic ABI and the allocation constraint that shapes the differentiable kernels: Enzyme cannot track Eigen's aligned allocator, so differentiable paths use Eigen::Map over caller-owned buffers and avoid heap expression temporaries. With the option off the build is unchanged.

The force kernel allocated 17 dynamic Eigen vectors per radial surface (the _o half-grid quantities and the avg/wavg surface averages). Move them to preallocated per-thread ThreadLocalStorage scratch and assign in place, so the radial loop allocates nothing. Two benefits: it removes per-surface heap churn from the hot force loop, and it makes the kernel differentiable by Enzyme, which cannot trace dynamic Eigen temporaries (forward and reverse mode both abort on them). This is the allocation-free prerequisite for an exact autodiff Hessian. Pure refactor, identical arithmetic. Verified bit-for-bit: vmec_standalone MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

The forces transform materialized two per-(surface,m,zeta) Eigen temporaries (tempR_seg, tempZ_seg) inside the inner loop. Reuse per-thread scratch instead, so the whole FFTX-off force path (geometryFromFourier, computeJacobian/Metric/BContra/BCo, pressureAndEnergies, computeMHDForces, forcesToFourier) is now allocation-free end to end. Same arithmetic as the previous .eval(); verified bit-for-bit: solovev 2.548352e+00, cth_like_fixed_bdy 5.057191e-02.

Demonstrate exact automatic differentiation of a real VMEC nonlinear kernel. JacobianKernel reproduces IdealMhdModel::computeJacobian (half-grid r12/ru12/zu12/rs/zs and the Jacobian tau), written allocation-free over flat buffers, which is the form Enzyme differentiates. For L = 0.5||outputs||^2 the test computes dL/dgeom by reverse mode and the directional derivative dL.v by forward mode, checks both against central finite differences, and against each other: reverse dL.v vs FD : 1.9e-9 forward dL.v vs FD : 1.9e-9 forward vs reverse : 2.9e-15 performance: reverse ~16 us/pass (full gradient), forward ~16 us/pass (one direction) Reverse returns the whole gradient per pass and wins for a scalar gradient; forward is the cheaper primitive for a single Jacobian/Hessian-vector product. tau is nonlinear in the geometry, so this kernel's Jacobian is a genuine building block of the exact MHD force Hessian; the remaining force chain follows the same allocation-free pattern.

Move the half-grid Jacobian arithmetic into jacobian_kernel.h (ComputeHalfGridJacobian), allocation-free over flat buffers. Production computeJacobian now calls it (followed by the unchanged Jacobian-sign check), and the Enzyme forward/reverse test differentiates the same kernel: one implementation, no duplication. Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Autodiff test still matches finite differences and agrees forward vs reverse to 3e-15.

Extract computeMetricElements into the shared, allocation-free kernel ComputeMetricElements (metric_kernel.h), over flat buffers, and call it from the solver. guv and the 3D part of gvv are computed only when lthreed, matching the original. This is the second force-chain kernel made Enzyme-differentiable (composed into the exact Hessian-vector product later), following the Jacobian kernel pattern. Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy (5.057191e-02, 3D path with guv/gvv).

Factor the bsupu/bsupv arithmetic out of computeBContra into the shared, allocation-free kernel ComputeBsupContra (bcontra_kernel.h). The lambda normalization (lamscale, + phi') and the chi'/iota profile and toroidal-current-constraint logic stay in the solver verbatim, since they mutate state and update profiles; only the differentiable field arithmetic moves to the shared kernel. Bit-exact across 1 and 4 threads (so the ghost-cell radial partitioning is exercised) on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy (5.057191e-02, 3D).

Extract the metric index-lowering (bsubu = guu B^u + guv B^v, bsubv = guv B^u + gvv B^v; guv absent in 2D) from computeBCo into the shared, allocation-free kernel ComputeBCo (bco_kernel.h). Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

Extract the field-dependent magnetic pressure |B|^2/2 = 0.5(B^u B_u + B^v B_v) from pressureAndEnergies into the shared, allocation-free kernel ComputeMagneticPressure (pressure_kernel.h). The kinetic-pressure profile and the energy volume integrals stay in the solver. Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Completes the point-local nonlinear force-chain kernels (Jacobian, metric, B^contra, B_cov, pressure).

Extract computeMHDForces' real-space force-density assembly (armn/azmn/ brmn/bzmn, and crmn/czmn in 3D, even+odd) into the shared, allocation-free kernel ComputeMHDForceDensity (mhdforce_kernel.h). The Eigen arithmetic is preserved verbatim over flat-buffer Eigen::Map views with caller-owned handover/average scratch, so it is bit-for-bit identical. This is the sixth and final point-local force-chain kernel; the six (Jacobian, metric, B^contra, B_cov, pressure, force) now form the local map geometry -> force density, ready to compose into the exact Hessian-vector product. (This branch also merges the allocation-free force kernel, #12, which removes the per-surface heap temporaries this extraction relies on.) Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

Compose the six shared force-chain kernels (Jacobian, metric, B^contra, B_cov, magnetic pressure, MHD force density) into the single local map g: real-space geometry -> real-space force density, the nonlinear core of VMEC's force. The full MHD force is T^T . g . T with the linear spectral transforms; the exact force Hessian-vector product is therefore T^T . J_g . T . v, and this provides J_g by autodiff. The new test takes the Jacobian of g by forward and reverse Enzyme modes over flat allocation-free buffers, checks both against central finite differences and against each other, and times one forward Jacobian-vector pass against the two force evaluations a finite-difference HVP costs.

Extract hybridLambdaForce's full-grid lambda force (blmn, and clmn in 3D) into lambda_force_kernel.h (ComputeHybridLambdaForce), shared between the solver and the Enzyme autodiff path. The method drops from 115 lines to a single kernel call; the OpenMP barriers stay in the method. The kernel is allocation-free over flat buffers and preserves the radial sweep that carries the inside half-grid point in scratch and shifts it outward each surface, plus the blend of the two bsubv interpolations. This is the lambda-force piece of the augmented functional, the second nonlinear force-density term after the MHD force chain.

Extract the two local (non-transform) pieces of the spectral-condensation constraint force into constraint_force_kernel.h, shared between the solver and the Enzyme autodiff path: - ComputeEffectiveConstraintForce: gConEff = (rCon-rCon0) ru + (zCon-zCon0) zu (effectiveConstraintForce), skipping the axis surface. - AddConstraintForces: add the bandpass-filtered gCon back into the MHD R/Z forces and write frcon/fzcon (the constraint part of assembleTotalForces). The Fourier-space bandpass between them stays the shared free function deAliasConstraintForce; the free-boundary rBSq contribution stays in assembleTotalForces. Allocation-free over flat buffers. This completes the local force-density terms of the augmented functional (MHD + lambda + constraint), the nonlinear core of the exact Hessian.

… fix)

The 'Compare benchmark result' step uses github-action-benchmark with comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from forks -> 'Resource not accessible by integration'. Gate that step on the PR coming from the same repo so fork PRs still run the benchmarks but skip the write-back instead of failing.

The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under the numpy 2.x that the test env now resolves, importing it dies in the f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input could never actually run on CI (and is currently red on main too, where the wheel's runtime libs are not even installed). Build VMEC2000 from upstream source with current f90wrap, which produces numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI (hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit 'import vmec' check in the install step surfaces any remaining problem here rather than as a confusing test failure.

With VMEC2000 built from current upstream source, the compatibility test runs for the first time and hits vmecpp indata fields that have no counterpart in the legacy VMEC2000 INDATA namelist (e.g. free_boundary_method), which raised AttributeError. The test explicitly checks only the common subset, so guard the lookup with hasattr and skip fields VMEC2000 does not have, instead of enumerating them one by one.

…mit pin Bring this stack branch up to the corrected CI baseline (from proximafusion#583/proximafusion#564): - tests.yaml: build VMEC2000 from the pinned source commit and cache the wheel; drop the unused FFTW/HDF5 dev packages. - benchmarks.yaml: skip the result upload on fork PRs (read-only token). - test_simsopt_compat.py: skip vmecpp-only INDATA fields. - CMakeLists: pin abseil to the 20260107.1 commit hash, not the tag.

krystophny added 16 commits June 14, 2026 08:11

Merge branch 'allocation-free-forces' into mhdforce-kernel

e2a667d

apply pre-commit formatting (ruff, docformatter, clang-format)

f2b548b

krystophny requested review from jons-pf and jurasic-pf as code owners June 14, 2026 14:34

This was referenced Jun 14, 2026

enzyme: extend the composed-force Hessian test with the lambda force #576

Open

Exact autodiff Hessian-vector product for the VMEC force #582

Open

krystophny added 4 commits June 14, 2026 17:04

bazel: declare force-chain kernel headers in ideal_mhd_model (sandbox…

486df79

… fix)

krystophny marked this pull request as draft June 15, 2026 04:48

krystophny added 2 commits June 15, 2026 07:22

Merge remote-tracking branch 'upstream/main' into HEAD

99de0f7

krystophny marked this pull request as ready for review June 15, 2026 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ideal_mhd_model: share the constraint-force kernels#575

ideal_mhd_model: share the constraint-force kernels#575
krystophny wants to merge 22 commits into
proximafusion:mainfrom
itpplasma:constraint-force-kernel

krystophny commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Jun 14, 2026

What

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant