Skip to content

ideal_mhd_model: share the magnetic-pressure kernel#571

Draft
krystophny wants to merge 13 commits into
proximafusion:mainfrom
itpplasma:pressure-kernel
Draft

ideal_mhd_model: share the magnetic-pressure kernel#571
krystophny wants to merge 13 commits into
proximafusion:mainfrom
itpplasma:pressure-kernel

Conversation

@krystophny

Copy link
Copy Markdown
Contributor

Stacked PR — part 8/19 of the differentiable-VMEC++ series. merge after #570 (bco-kernel).
Diff is cumulative (includes ancestor commits) because the branches are stacked on the fork; review the net change described below.


What

Extract the field-dependent magnetic pressure |B|^2/2 = 0.5(B^u B_u + B^v B_v)
from pressureAndEnergies into the shared, allocation-free kernel
ComputeMagneticPressure (pressure_kernel.h). The kinetic-pressure profile and
the energy volume integrals (with their OpenMP reductions) stay in the solver.

Why

Fifth and last of the point-local nonlinear force-chain kernels (Jacobian #13,
metric #14, B^contra #15, B_cov #16, pressure here). With all five in
allocation-free flat-buffer form, the exact Hessian-vector product can compose
them (between the analytic spectral transforms) via one Enzyme pass.

Verification

Bit-for-bit vmec_standalone MHD energy, 1 and 4 threads:

            1 thread        4 threads
solovev     2.548352e+00    2.548352e+00
cth_like    5.057191e-02    5.057191e-02

Stacked on #16.

The CMake FetchContent abseil pin (2024-08) fails to compile under
Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the
numbers.cc nullability annotations are rejected by the newer frontend.
Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8
and GCC. Clang is the compiler required for the Enzyme autodiff build.

The Bazel build keeps its own (BCR) abseil pin and is unaffected.
Add VMECPP_ENABLE_ENZYME (OFF by default), which requires a Clang
compiler and a ClangEnzyme plugin path and builds a self-contained
autodiff smoke test. The test differentiates a scalar objective written
over Eigen::Map'd caller buffers and checks reverse- and forward-mode
Enzyme gradients against the closed form and central finite differences.

enzyme.h documents the intrinsic ABI and the allocation constraint that
shapes the differentiable kernels: Enzyme cannot track Eigen's aligned
allocator, so differentiable paths use Eigen::Map over caller-owned
buffers and avoid heap expression temporaries.

With the option off the build is unchanged.
Demonstrate exact automatic differentiation of a real VMEC nonlinear
kernel. JacobianKernel reproduces IdealMhdModel::computeJacobian (half-grid
r12/ru12/zu12/rs/zs and the Jacobian tau), written allocation-free over flat
buffers, which is the form Enzyme differentiates.

For L = 0.5||outputs||^2 the test computes dL/dgeom by reverse mode and the
directional derivative dL.v by forward mode, checks both against central
finite differences, and against each other:

  reverse dL.v vs FD : 1.9e-9
  forward dL.v vs FD : 1.9e-9
  forward vs reverse : 2.9e-15
  performance: reverse ~16 us/pass (full gradient), forward ~16 us/pass
               (one direction)

Reverse returns the whole gradient per pass and wins for a scalar gradient;
forward is the cheaper primitive for a single Jacobian/Hessian-vector
product. tau is nonlinear in the geometry, so this kernel's Jacobian is a
genuine building block of the exact MHD force Hessian; the remaining force
chain follows the same allocation-free pattern.
Move the half-grid Jacobian arithmetic into jacobian_kernel.h
(ComputeHalfGridJacobian), allocation-free over flat buffers. Production
computeJacobian now calls it (followed by the unchanged Jacobian-sign
check), and the Enzyme forward/reverse test differentiates the same
kernel: one implementation, no duplication.

Bit-exact: vmec_standalone MHD energy unchanged on solovev
(2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Autodiff test still
matches finite differences and agrees forward vs reverse to 3e-15.
Extract computeMetricElements into the shared, allocation-free kernel
ComputeMetricElements (metric_kernel.h), over flat buffers, and call it
from the solver. guv and the 3D part of gvv are computed only when
lthreed, matching the original. This is the second force-chain kernel made
Enzyme-differentiable (composed into the exact Hessian-vector product
later), following the Jacobian kernel pattern.

Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00,
2D) and cth_like_fixed_bdy (5.057191e-02, 3D path with guv/gvv).
Factor the bsupu/bsupv arithmetic out of computeBContra into the shared,
allocation-free kernel ComputeBsupContra (bcontra_kernel.h). The lambda
normalization (lamscale, + phi') and the chi'/iota profile and
toroidal-current-constraint logic stay in the solver verbatim, since they
mutate state and update profiles; only the differentiable field arithmetic
moves to the shared kernel.

Bit-exact across 1 and 4 threads (so the ghost-cell radial partitioning is
exercised) on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy
(5.057191e-02, 3D).
Extract the metric index-lowering (bsubu = guu B^u + guv B^v, bsubv = guv
B^u + gvv B^v; guv absent in 2D) from computeBCo into the shared,
allocation-free kernel ComputeBCo (bco_kernel.h).

Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and
cth_like_fixed_bdy (5.057191e-02).
Extract the field-dependent magnetic pressure |B|^2/2 = 0.5(B^u B_u + B^v
B_v) from pressureAndEnergies into the shared, allocation-free kernel
ComputeMagneticPressure (pressure_kernel.h). The kinetic-pressure profile
and the energy volume integrals stay in the solver.

Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and
cth_like_fixed_bdy (5.057191e-02). Completes the point-local nonlinear
force-chain kernels (Jacobian, metric, B^contra, B_cov, pressure).
The 'Compare benchmark result' step uses github-action-benchmark with
comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from
forks -> 'Resource not accessible by integration'. Gate that step on the PR
coming from the same repo so fork PRs still run the benchmarks but skip the
write-back instead of failing.
The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under
the numpy 2.x that the test env now resolves, importing it dies in the
f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension
must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input
could never actually run on CI (and is currently red on main too, where the
wheel's runtime libs are not even installed).

Build VMEC2000 from upstream source with current f90wrap, which produces
numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI
(hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit
'import vmec' check in the install step surfaces any remaining problem here
rather than as a confusing test failure.
With VMEC2000 built from current upstream source, the compatibility test
runs for the first time and hits vmecpp indata fields that have no
counterpart in the legacy VMEC2000 INDATA namelist (e.g.
free_boundary_method), which raised AttributeError. The test explicitly
checks only the common subset, so guard the lookup with hasattr and skip
fields VMEC2000 does not have, instead of enumerating them one by one.
@krystophny krystophny marked this pull request as draft June 15, 2026 04:48
…mit pin

Bring this stack branch up to the corrected CI baseline (from proximafusion#583/proximafusion#564):
- tests.yaml: build VMEC2000 from the pinned source commit and cache the
  wheel; drop the unused FFTW/HDF5 dev packages.
- benchmarks.yaml: skip the result upload on fork PRs (read-only token).
- test_simsopt_compat.py: skip vmecpp-only INDATA fields.
- CMakeLists: pin abseil to the 20260107.1 commit hash, not the tag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant