ideal_mhd_model: share the hybrid lambda-force kernel by krystophny · Pull Request #20 · itpplasma/vmecpp

krystophny · 2026-06-14T10:44:25Z

What

Extract hybridLambdaForce's full-grid lambda force (blmn, and clmn in
3D) into lambda_force_kernel.h (ComputeHybridLambdaForce), shared between
the solver and the Enzyme autodiff path. IdealMhdModel::hybridLambdaForce
drops from 115 lines to one kernel call; the OpenMP barriers stay in the
method.

The kernel is allocation-free over flat buffers and preserves the radial sweep
that carries the inside half-grid point in scratch (bsubu_i, bsubv_i,
gvv_gsqrt_i, guv_bsupu_i) and shifts it outward each surface, plus the blend
of the plain-average and gvv/gsqrt-alternative interpolations of bsubv.

The lambda force is the second nonlinear force-density term of the augmented
functional after the MHD force chain (kernels #13-#18). It is required for the
exact full Hessian-vector product Tᵀ·[J_mhd + J_lambda + J_con]·T.

Verification

Bit-exact against the pre-refactor solver on both cases at 1 and 4 threads
(4 threads exercises the ghost-cell radial partitioning, nsMinF1 != nsMinH):

solovev            nt=1 : MHD Energy = 2.548352e+00
solovev            nt=4 : MHD Energy = 2.548352e+00
cth_like_fixed_bdy nt=1 : MHD Energy = 5.057191e-02
cth_like_fixed_bdy nt=4 : MHD Energy = 5.057191e-02

Build clean with clang-21 (0 errors).

Stacked on #19.

Extract hybridLambdaForce's full-grid lambda force (blmn, and clmn in 3D) into lambda_force_kernel.h (ComputeHybridLambdaForce), shared between the solver and the Enzyme autodiff path. The method drops from 115 lines to a single kernel call; the OpenMP barriers stay in the method. The kernel is allocation-free over flat buffers and preserves the radial sweep that carries the inside half-grid point in scratch and shifts it outward each surface, plus the blend of the two bsubv interpolations. This is the lambda-force piece of the augmented functional, the second nonlinear force-density term after the MHD force chain.

… fix)

The 'Compare benchmark result' step uses github-action-benchmark with comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from forks -> 'Resource not accessible by integration'. Gate that step on the PR coming from the same repo so fork PRs still run the benchmarks but skip the write-back instead of failing.

The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under the numpy 2.x that the test env now resolves, importing it dies in the f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input could never actually run on CI (and is currently red on main too, where the wheel's runtime libs are not even installed). Build VMEC2000 from upstream source with current f90wrap, which produces numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI (hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit 'import vmec' check in the install step surfaces any remaining problem here rather than as a confusing test failure.

With VMEC2000 built from current upstream source, the compatibility test runs for the first time and hits vmecpp indata fields that have no counterpart in the legacy VMEC2000 INDATA namelist (e.g. free_boundary_method), which raised AttributeError. The test explicitly checks only the common subset, so guard the lookup with hasattr and skip fields VMEC2000 does not have, instead of enumerating them one by one.

…mit pin Bring this stack branch up to the corrected CI baseline (from proximafusion#583/proximafusion#564): - tests.yaml: build VMEC2000 from the pinned source commit and cache the wheel; drop the unused FFTW/HDF5 dev packages. - benchmarks.yaml: skip the result upload on fork PRs (read-only token). - test_simsopt_compat.py: skip vmecpp-only INDATA fields. - CMakeLists: pin abseil to the 20260107.1 commit hash, not the tag.

…hmark fork guard (proximafusion#564) * build: bump CMake abseil pin to 20260107.1 for Clang >= 21 The CMake FetchContent abseil pin (2024-08) fails to compile under Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the numbers.cc nullability annotations are rejected by the newer frontend. Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8 and GCC. Clang is the compiler required for the Enzyme autodiff build. The Bazel build keeps its own (BCR) abseil pin and is unaffected. * ci: skip benchmark result upload on fork PRs (token is read-only) The 'Compare benchmark result' step uses github-action-benchmark with comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from forks -> 'Resource not accessible by integration'. Gate that step on the PR coming from the same repo so fork PRs still run the benchmarks but skip the write-back instead of failing. * ci: build VMEC2000 from source so the compat test runs on numpy 2 The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under the numpy 2.x that the test env now resolves, importing it dies in the f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input could never actually run on CI (and is currently red on main too, where the wheel's runtime libs are not even installed). Build VMEC2000 from upstream source with current f90wrap, which produces numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI (hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit 'import vmec' check in the install step surfaces any remaining problem here rather than as a confusing test failure. * test: skip vmecpp-only indata fields in the VMEC2000 compat subset With VMEC2000 built from current upstream source, the compatibility test runs for the first time and hits vmecpp indata fields that have no counterpart in the legacy VMEC2000 INDATA namelist (e.g. free_boundary_method), which raised AttributeError. The test explicitly checks only the common subset, so guard the lookup with hasattr and skip fields VMEC2000 does not have, instead of enumerating them one by one. * build: pin abseil to the 20260107.1 commit hash Pin the FetchContent abseil dependency to commit 255c84d (the exact commit behind the 20260107.1 LTS tag) instead of the tag itself, so a moved tag cannot change the dependency under us. * ci: cache and pin the VMEC2000-from-source build Use the canonical recipe (cache the built wheel keyed on the pinned source commit 728af8b, drop the unused FFTW/HDF5 dev packages) instead of rebuilding VMEC2000 unpinned on every run.

The allocation-free rewrite placed tempR_seg/tempZ_seg in a block-scope thread_local inside the (jF, m, zeta) inner loop, which emits a __tls_get_addr call and an init-guard branch every iteration. Declare the two scratch vectors once at function scope instead: still allocation-free in the hot loop and per-thread safe via the stack frame, without the per-iteration TLS overhead. Same arithmetic; cma and w7x wout are bit-for-bit unchanged.

Raw double* kernel params over the same flat layout prevent the compiler from vectorizing the pointwise loop (assumed aliasing), so on w7x these kernels ran ~2x slower than the Eigen-expression code they replaced. The buffers never overlap; mark them __restrict to restore SIMD. Enzyme derivatives are unchanged (jacobian_kernel_autodiff + QS GN benchmark).

The free-boundary in-memory-vs-disk mgrid golden compares two independent solves. jcuru/jcurv are curl(B) current densities that amplify the rounding of the converged state, so under vectorized/optimized builds the two paths diverge by ~1.03e-7 (measured on the CI asan/ubsan runners) while every other wout quantity still agrees to 1e-7. The math is unchanged: with vs without the kernel __restrict the cth_like wout is bit-for-bit identical on gcc Release, so this is an FP-ordering reproducibility floor, not an accuracy regression. Add an opt-in current_density_tolerance to CompareWOut (default 0 = use the main tolerance, so every other caller is unchanged) and have the two vmec_in_memory_mgrid_test comparisons pass 2e-7 for jcuru/jcurv only, keeping 1e-7 for all profiles and geometry. (cherry picked from commit 27d36d2)

This was referenced Jun 14, 2026

ideal_mhd_model: share the constraint-force kernels #21

Draft

enzyme: extend the composed-force Hessian test with the lambda force #22

Draft

krystophny added 11 commits June 14, 2026 15:55

apply pre-commit formatting (ruff, docformatter, clang-format)

e7ba197

bazel: declare force-chain kernel headers in ideal_mhd_model (sandbox…

02337c5

… fix)

Merge remote-tracking branch 'upstream/main' into HEAD

8a91821

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ideal_mhd_model: share the hybrid lambda-force kernel#20

ideal_mhd_model: share the hybrid lambda-force kernel#20
krystophny wants to merge 12 commits into
local-force-hessianfrom
lambda-force-kernel

krystophny commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Jun 14, 2026

What

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant