Skip to content

WIP: Add s2 manifold to conservative_2d (stacks on #70)#71

Draft
thodson-usgs wants to merge 16 commits intoxarray-contrib:mainfrom
thodson-usgs:spherely-integration
Draft

WIP: Add s2 manifold to conservative_2d (stacks on #70)#71
thodson-usgs wants to merge 16 commits intoxarray-contrib:mainfrom
thodson-usgs:spherely-integration

Conversation

@thodson-usgs
Copy link
Copy Markdown

Summary

Adds a manifold kwarg to ConservativeRegridder / .regrid.conservative_2d with three values:

  • "planar" (default, existing behavior): raw planar shapely intersection
  • "cea" (existing): analytic Lambert cylindrical equal-area projection before planar intersection — correct spherical areas with the planar fast-path cost, straight-line edges in the projected plane
  • "s2" (new): true great-circle edges via the optional spherely package (Google's s2geometry bindings). Full spherical geometry — correct for cells at any latitude or longitude extent.

Install with pip install xarray-regrid[spherical]. spherely ships pre-built wheels on PyPI for macOS / Linux x86_64 / Windows (Python 3.10–3.14), so no C library build needed for typical users. If spherely isn't installed, manifold="s2" raises ImportError with a clear install hint; the rest of the module keeps working.

Stacks on #70

This PR targets main because GitHub doesn't support cross-fork PRs with non-main bases, but the real delta vs #70 is small. Useful views:

Design notes

  • spherely bridging: _Grid gets an optional s2_polys: np.ndarray | None field. When both source and target carry s2_polys, _build_intersection_areas routes through spherely.intersection + spherely.area (both vectorized ufuncs) instead of planar shapely. The shapely STRtree on planar bboxes remains as the candidate-pair filter — necessary because spherely has no spatial index yet (tracked upstream at benbovy/spherely#72). Planar bboxes are a lossless over-estimate of spherical overlap, so the coarse filter is safe.
  • Manifold = Literal["planar", "cea", "s2"] type alias on public signatures; runtime _check_manifold retained for defense on deserialization paths (e.g. loading a netCDF file with a corrupted attribute).
  • Vectorized polygon construction fast-path: _s2_cell_polys checks hasattr(spherely, "polygons") and uses the batched ufunc when available. Upstream PR benbovy/spherely#52 (draft on my fork) adds this — measured ~2× speedup on construction, not blocking.
  • Schema version bump 1 → 2 on the persisted netCDF format (since spherical: bool became manifold: str on RegridderMetadata). Old files raise a clear schema-mismatch error.

Optional dependency layout

The spherical extra builds on top of conservative_2d:

conservative_2d = ["shapely>=2.0", "h5netcdf"]
spherical = ["xarray-regrid[conservative_2d]", "spherely>=0.1.1"]

Users can install either independently:

  • pip install xarray-regrid[conservative_2d] — 2D regrid without s2
  • pip install xarray-regrid[spherical] — full s2 support (transitively pulls the 2d stack)

Performance

s2 grid construction dominates one-time build cost. Measurements on benbovy/spherely vectorized-polygons branch (not yet released):

cells scalar (0.1.1) vectorized (polygons())
16,200 0.85s 0.56s
1,036,800 ~50s (extrapolated) ~30s

Smaller than my original 10-50× forecast because the s2geometry polygon construction itself is the floor; the vectorized ufunc only eliminates pybind11 per-call overhead. Details in docs/spherical_performance_plan.md (not included in this PR — in the repo as uncommitted scratch).

Apply-path cost is negligible; spherical weight matrix is ~ same size as planar.

Test plan

  • Correctness: manifold="s2" on a constant field roundtrips to the constant at machine precision.
  • Conservation: out · a_dst = A · s to 1e-12 on a random field with the s2 area matrix.
  • Pole-touching cells (outer lat edges at ±90°): the oriented=True s2 path picks the correct cap orientation — regression-tested.
  • Manifold enum rejection: manifold="bogus" raises ValueError at construction and again in _from_state for loaded files.
  • Install hint: manifold="s2" without spherely raises ImportError with the pip install xarray-regrid[spherical] hint.
  • netCDF save/load preserves the manifold choice; manifold="s2" survives roundtrip.
  • 105 tests pass on the branch (100 from conservative + 5 new s2-specific tests).

Docs

Fourth demo notebook docs/notebooks/demos/demo_conservative_2d_spherical.ipynb, structured to match the other three in the conservative_2d family:

  • Intro + three-manifold menu + install hint
  • cos²(lat) source on a 1° grid
  • All three manifolds integrated against analytic truth (8π/3)
  • Spatial s2 − cea diff map on a coarse grid (where great-circle vs straight-line edges diverge most)
  • .to_netcdf / .from_netcdf roundtrip preserving the manifold choice

Follow-ups (out of scope)

  • Curvilinear s2 (2D lat/lon coord arrays): not supported yet. Trivial extension once spherely ships a spatial index (so we can drop the shapely STRtree bridge that currently requires planar bboxes).
  • Antimeridian-crossing rectilinear cells: s2 handles these correctly (the oriented=True orientation is unambiguous), but the shapely planar bbox used for candidate filtering doesn't. A small lon-wrap fix would plug the gap; not critical for typical global grids.
  • Upstream contributions:
    • benbovy/spherely#52 — vectorized spherely.polygons(...) constructor. Draft branch exists; would give ~2× on build.
    • benbovy/spherely#72 — s2 spatial index. Larger contribution; would let us drop the shapely bridge entirely.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs force-pushed the spherely-integration branch 2 times, most recently from 3577fd0 to 79b68ce Compare April 20, 2026 18:51
@thodson-usgs thodson-usgs force-pushed the spherely-integration branch from 79b68ce to c79ae58 Compare April 21, 2026 20:41
@thodson-usgs thodson-usgs changed the title Add s2 manifold to conservative_2d (stacks on #70) WIP: Add s2 manifold to conservative_2d (stacks on #70) Apr 22, 2026
thodson-usgs and others added 13 commits April 27, 2026 09:55
Adds ConservativeRegridder and the .regrid.conservative_2d accessor for grids
the existing 1D-factored conservative path can't express: curvilinear (2D
lat/lon coords), unstructured meshes, and arbitrary polygon-to-polygon
aggregation. Cell intersections go through shapely with sparse-COO weight
storage, threaded above ~1k pairs, with optional spherical=True for analytic
equal-area weights on lat/lon grids, antimeridian handling, and netCDF
save/load for caching weight matrices. Demo notebooks under
docs/notebooks/demos/ cover curvilinear, unstructured, and grid->regions
aggregation. Optional dependency: install with the conservative-2d extra.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Source on [0, 360] vs target on [-180, 180] (and reverse) yielded NaN
on half the cells: a uniform shift can't reconcile the two conventions
since the mean diff is exactly 180° and round() banker's-rounds to 0.
Per-value wrap source longitudes into the target's 360° window, mirror-
ing format_lon, and remap area-matrix columns when wrap breaks source
monotonicity. 2D / curvilinear coords keep the existing uniform-shift
fallback. Adds regression test covering both directions, planar and
spherical.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Swaps the synthetic polygon demo for xarray's air_temperature tutorial
dataset aggregated onto US states dissolved from geodatasets.ncovr.
Same workflow end-to-end (from_polygons, map+bar plot, conservation
check, save/reload) but with recognizable inputs.

Adds a [notebooks] extras group covering the runtime deps the demos
need (matplotlib, pooch, geopandas, geodatasets); shapely/h5netcdf/
sparse/dask stay in [conservative-2d] and [accel] so the two can be
combined.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The map's vmin/vmax span the full source-grid temperature range, but
the bar chart was normalizing to the bar values' own min/max — so a
state at the cool end was deeper blue on the bar than on the map.
Reuse the map's vmin/vmax for the bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Each notebook now opens with motivation (what conservative regridding
is, when to use it), what makes the case 1D-non-separable (curvilinear
coords / arbitrary polygons / unstructured mesh), and a numbered summary
of what the notebook will do. Each step's markdown picks up a sentence
or two of "why this step" narrative — what's expensive vs. cheap, what
to look for in the result, what choices reflect domain conventions —
without changing any of the code or expected outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ooks

The unnormalized ``(n_dst, n_src)`` cell-intersection area matrix was
stored as ``self._areas`` but read externally by tests and the demo
notebooks for conservation diagnostics. Promote to ``self.areas`` as a
plain public attribute (consistent with the existing ``self.spherical``,
``self.x_coord``, ``self.y_coord`` style on the same class) and update
the four external callers. Internal storage is the same — no behavior
or perf change (verified against 3ed200b: build/apply timings within
run-to-run noise on rect, curvilinear, and polygon cases).

Also runs ``ruff format`` (4-line collapses in conservative_2d.py and
test_conservative_2d.py; normalizes notebook source from single-string
to list-of-strings) and clears notebook outputs via nbformat-equivalent
JSON manipulation.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The existing axis-factored `.conservative` path is tight for rectilinear
grids but can't express curvilinear or unstructured meshes. This adds a
`ConservativeRegridder` that computes 2D cell-polygon intersections via
shapely + STRtree, producing a sparse weight matrix applied as a single
matmul. Unlike xESMF, it installs with a single `pip install shapely`,
works on Windows, and composes with dask.distributed.

Core:
- `ConservativeRegridder(src, tgt, x_coord, y_coord, spherical=False)`:
  reusable regridder; builds weights once, applies to many fields; `.T`
  returns the transpose (target→source) sharing cached state.
- `.from_polygons(src_polys, tgt_polys, ...)`: unstructured-mesh factory
  (MPAS/ICON/regions).
- `spherical=True`: analytic Lambert cylindrical equal-area projection
  of cell edges — matches the factored path's sin-weighted accuracy on
  lat/lon grids without pulling in pyproj.
- `.to_netcdf(path)` / `.from_netcdf(path)`: persist the sparse matrix
  plus `RegridderMetadata` (dim names, shapes, grid ranges, spherical
  flag, version, timestamp, schema version) and the source/target coord
  groups, for reproducibility and fast rebuild on subsequent runs.
- Also exposes `.regrid.conservative_polygon(...)` on the xarray
  accessor, and a `polygons_from_coords` helper for mixing structured
  and unstructured grids.

Correctness/perf wins verified by tests + experiments (not committed):
- Rectilinear fast path (analytic box-overlap, no GEOS) → ~30x vs
  curvilinear at 1M cells.
- ThreadPool around GEOS intersection → 3-4x on curvilinear.
- Pre-transposed, index-sorted apply matrix cached on the regridder.
- Module-level `warnings.filterwarnings` for sparse's spurious NaN
  warning (thread-safe under dask).
- Input dtype preserved end-to-end (float32 in → float32 out).
- Empty target coverage → NaN (regression-tested for polygon holes +
  out-of-domain cells).

100 tests added in `tests/test_conservative_polygon.py` covering
reusability, transpose roundtrip, spherical vs planar, curvilinear,
NaN thresholds, dtype preservation, from_polygons, mass conservation,
and netCDF save/load.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Three short notebooks under docs/notebooks/demos/ walking through the
main workflows:

- basics: structured lat/lon source → coarser target, with the
  spherical=True switch; mass-integral comparison against 8π/3 truth.
- curvilinear: regular source → 30°-rotated 2D target, plus a
  machine-precision mass-conservation check on the covered subset.
- unstructured: scipy Voronoi mesh target via from_polygons, plus a
  to_netcdf / from_netcdf roundtrip demonstrating weight persistence.

Each uses analytic / synthetic data so they're reproducible without
external downloads, and each includes a concrete diagnostic (mass
integral, error comparison table, or reload bit-exactness) rather
than just plots.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Clears cell outputs and execution counts so the PR diff shows only the
narrative + code. Reviewers (and readthedocs, if it rebuilds on merge)
run the cells themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Adds a `manifold` kwarg to ConservativeRegridder, conservative_2d_regrid,
and the `.regrid.conservative_2d` accessor. Values: "planar" (default,
unchanged), "cea" (the former `spherical=True` path — cylindrical
equal-area projection, correct areas on the planar fast path), and "s2"
(new — great-circle cell polygons + intersections via the optional
spherely dependency, exact spherical geometry including near the poles).

Key additions:
- `Manifold = Literal["planar", "cea", "s2"]` with matching `_MANIFOLDS`
  runtime tuple (kept in sync via get_args).
- Optional spherely import, `_check_manifold` dispatched at construction
  and again in `_from_state` as defense-in-depth for netCDF reloads.
- `_build_s2_grid` / `_s2_cell_polys` builders; `_Grid` gains
  `s2_polys: np.ndarray | None`; `_build_intersection_areas` dispatches
  to spherely when both grids carry s2_polys, keeping the shapely
  STRtree as the (lossless) candidate-pair filter since spherely has
  no spatial index yet (benbovy/spherely#72).
- Metadata on-disk schema bumped to 2: stores `manifold` string instead
  of the old `spherical` bool. `polygons_from_coords` narrows to
  Literal["planar", "cea"] — s2 isn't representable outside the
  regridder since spherely Geographies aren't shapely polygons.

Squash of the 6 spherely-integration commits, rebased onto the
conservative branch (which dropped the RegridderMetadata dataclass in
favor of kwargs-dict helpers, switched the per-direction cache to
cached_property, and flipped predicate_filter to bbox-only by default).
All manifold plumbing is wired through the new shape; netCDF metadata
now emits "manifold" via the helpers and parses it back.

Tests: 108 pass (up from 103), including 5 s2-specific tests for mass
conservation, constant-field preservation, netCDF roundtrip,
pole-touching cells, and invalid-manifold rejection.
_s2_intersection_areas now chunks the candidate pair array across
n_threads and calls spherely.intersection+area per chunk in parallel.
This depends on spherely releasing the GIL inside its vectorized
boolean ops (pending upstream at benbovy/spherely); with the patched
spherely the serial-path wall time goes from ~1.5s to ~0.79s on a
180x360 -> 60x120 s2 build (1.9x) on a 12-core machine.

Falls back to serial (single spherely.intersection call) when the
workload is under 50k pairs or n_threads<=1; older spherely builds
that don't release the GIL silently run serial in each worker.
_build_intersection_areas now threads n_threads through to the s2
branch, matching the shapely path.
@thodson-usgs thodson-usgs force-pushed the spherely-integration branch from ec830bb to 2aa3c88 Compare April 28, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant