WIP: Add s2 manifold to conservative_2d (stacks on #70)#71
Draft
thodson-usgs wants to merge 16 commits intoxarray-contrib:mainfrom
Draft
WIP: Add s2 manifold to conservative_2d (stacks on #70)#71thodson-usgs wants to merge 16 commits intoxarray-contrib:mainfrom
thodson-usgs wants to merge 16 commits intoxarray-contrib:mainfrom
Conversation
3577fd0 to
79b68ce
Compare
2 tasks
79b68ce to
c79ae58
Compare
13 tasks
Adds ConservativeRegridder and the .regrid.conservative_2d accessor for grids the existing 1D-factored conservative path can't express: curvilinear (2D lat/lon coords), unstructured meshes, and arbitrary polygon-to-polygon aggregation. Cell intersections go through shapely with sparse-COO weight storage, threaded above ~1k pairs, with optional spherical=True for analytic equal-area weights on lat/lon grids, antimeridian handling, and netCDF save/load for caching weight matrices. Demo notebooks under docs/notebooks/demos/ cover curvilinear, unstructured, and grid->regions aggregation. Optional dependency: install with the conservative-2d extra. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Source on [0, 360] vs target on [-180, 180] (and reverse) yielded NaN on half the cells: a uniform shift can't reconcile the two conventions since the mean diff is exactly 180° and round() banker's-rounds to 0. Per-value wrap source longitudes into the target's 360° window, mirror- ing format_lon, and remap area-matrix columns when wrap breaks source monotonicity. 2D / curvilinear coords keep the existing uniform-shift fallback. Adds regression test covering both directions, planar and spherical. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Swaps the synthetic polygon demo for xarray's air_temperature tutorial dataset aggregated onto US states dissolved from geodatasets.ncovr. Same workflow end-to-end (from_polygons, map+bar plot, conservation check, save/reload) but with recognizable inputs. Adds a [notebooks] extras group covering the runtime deps the demos need (matplotlib, pooch, geopandas, geodatasets); shapely/h5netcdf/ sparse/dask stay in [conservative-2d] and [accel] so the two can be combined. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The map's vmin/vmax span the full source-grid temperature range, but the bar chart was normalizing to the bar values' own min/max — so a state at the cool end was deeper blue on the bar than on the map. Reuse the map's vmin/vmax for the bar. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Each notebook now opens with motivation (what conservative regridding is, when to use it), what makes the case 1D-non-separable (curvilinear coords / arbitrary polygons / unstructured mesh), and a numbered summary of what the notebook will do. Each step's markdown picks up a sentence or two of "why this step" narrative — what's expensive vs. cheap, what to look for in the result, what choices reflect domain conventions — without changing any of the code or expected outputs. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ooks The unnormalized ``(n_dst, n_src)`` cell-intersection area matrix was stored as ``self._areas`` but read externally by tests and the demo notebooks for conservation diagnostics. Promote to ``self.areas`` as a plain public attribute (consistent with the existing ``self.spherical``, ``self.x_coord``, ``self.y_coord`` style on the same class) and update the four external callers. Internal storage is the same — no behavior or perf change (verified against 3ed200b: build/apply timings within run-to-run noise on rect, curvilinear, and polygon cases). Also runs ``ruff format`` (4-line collapses in conservative_2d.py and test_conservative_2d.py; normalizes notebook source from single-string to list-of-strings) and clears notebook outputs via nbformat-equivalent JSON manipulation. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The existing axis-factored `.conservative` path is tight for rectilinear grids but can't express curvilinear or unstructured meshes. This adds a `ConservativeRegridder` that computes 2D cell-polygon intersections via shapely + STRtree, producing a sparse weight matrix applied as a single matmul. Unlike xESMF, it installs with a single `pip install shapely`, works on Windows, and composes with dask.distributed. Core: - `ConservativeRegridder(src, tgt, x_coord, y_coord, spherical=False)`: reusable regridder; builds weights once, applies to many fields; `.T` returns the transpose (target→source) sharing cached state. - `.from_polygons(src_polys, tgt_polys, ...)`: unstructured-mesh factory (MPAS/ICON/regions). - `spherical=True`: analytic Lambert cylindrical equal-area projection of cell edges — matches the factored path's sin-weighted accuracy on lat/lon grids without pulling in pyproj. - `.to_netcdf(path)` / `.from_netcdf(path)`: persist the sparse matrix plus `RegridderMetadata` (dim names, shapes, grid ranges, spherical flag, version, timestamp, schema version) and the source/target coord groups, for reproducibility and fast rebuild on subsequent runs. - Also exposes `.regrid.conservative_polygon(...)` on the xarray accessor, and a `polygons_from_coords` helper for mixing structured and unstructured grids. Correctness/perf wins verified by tests + experiments (not committed): - Rectilinear fast path (analytic box-overlap, no GEOS) → ~30x vs curvilinear at 1M cells. - ThreadPool around GEOS intersection → 3-4x on curvilinear. - Pre-transposed, index-sorted apply matrix cached on the regridder. - Module-level `warnings.filterwarnings` for sparse's spurious NaN warning (thread-safe under dask). - Input dtype preserved end-to-end (float32 in → float32 out). - Empty target coverage → NaN (regression-tested for polygon holes + out-of-domain cells). 100 tests added in `tests/test_conservative_polygon.py` covering reusability, transpose roundtrip, spherical vs planar, curvilinear, NaN thresholds, dtype preservation, from_polygons, mass conservation, and netCDF save/load. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Three short notebooks under docs/notebooks/demos/ walking through the main workflows: - basics: structured lat/lon source → coarser target, with the spherical=True switch; mass-integral comparison against 8π/3 truth. - curvilinear: regular source → 30°-rotated 2D target, plus a machine-precision mass-conservation check on the covered subset. - unstructured: scipy Voronoi mesh target via from_polygons, plus a to_netcdf / from_netcdf roundtrip demonstrating weight persistence. Each uses analytic / synthetic data so they're reproducible without external downloads, and each includes a concrete diagnostic (mass integral, error comparison table, or reload bit-exactness) rather than just plots. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Clears cell outputs and execution counts so the PR diff shows only the narrative + code. Reviewers (and readthedocs, if it rebuilds on merge) run the cells themselves. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Adds a `manifold` kwarg to ConservativeRegridder, conservative_2d_regrid, and the `.regrid.conservative_2d` accessor. Values: "planar" (default, unchanged), "cea" (the former `spherical=True` path — cylindrical equal-area projection, correct areas on the planar fast path), and "s2" (new — great-circle cell polygons + intersections via the optional spherely dependency, exact spherical geometry including near the poles). Key additions: - `Manifold = Literal["planar", "cea", "s2"]` with matching `_MANIFOLDS` runtime tuple (kept in sync via get_args). - Optional spherely import, `_check_manifold` dispatched at construction and again in `_from_state` as defense-in-depth for netCDF reloads. - `_build_s2_grid` / `_s2_cell_polys` builders; `_Grid` gains `s2_polys: np.ndarray | None`; `_build_intersection_areas` dispatches to spherely when both grids carry s2_polys, keeping the shapely STRtree as the (lossless) candidate-pair filter since spherely has no spatial index yet (benbovy/spherely#72). - Metadata on-disk schema bumped to 2: stores `manifold` string instead of the old `spherical` bool. `polygons_from_coords` narrows to Literal["planar", "cea"] — s2 isn't representable outside the regridder since spherely Geographies aren't shapely polygons. Squash of the 6 spherely-integration commits, rebased onto the conservative branch (which dropped the RegridderMetadata dataclass in favor of kwargs-dict helpers, switched the per-direction cache to cached_property, and flipped predicate_filter to bbox-only by default). All manifold plumbing is wired through the new shape; netCDF metadata now emits "manifold" via the helpers and parses it back. Tests: 108 pass (up from 103), including 5 s2-specific tests for mass conservation, constant-field preservation, netCDF roundtrip, pole-touching cells, and invalid-manifold rejection.
_s2_intersection_areas now chunks the candidate pair array across n_threads and calls spherely.intersection+area per chunk in parallel. This depends on spherely releasing the GIL inside its vectorized boolean ops (pending upstream at benbovy/spherely); with the patched spherely the serial-path wall time goes from ~1.5s to ~0.79s on a 180x360 -> 60x120 s2 build (1.9x) on a 12-core machine. Falls back to serial (single spherely.intersection call) when the workload is under 50k pairs or n_threads<=1; older spherely builds that don't release the GIL silently run serial in each worker. _build_intersection_areas now threads n_threads through to the s2 branch, matching the shapely path.
ec830bb to
2aa3c88
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
manifoldkwarg toConservativeRegridder/.regrid.conservative_2dwith three values:"planar"(default, existing behavior): raw planar shapely intersection"cea"(existing): analytic Lambert cylindrical equal-area projection before planar intersection — correct spherical areas with the planar fast-path cost, straight-line edges in the projected plane"s2"(new): true great-circle edges via the optionalspherelypackage (Google's s2geometry bindings). Full spherical geometry — correct for cells at any latitude or longitude extent.Install with
pip install xarray-regrid[spherical]. spherely ships pre-built wheels on PyPI for macOS / Linux x86_64 / Windows (Python 3.10–3.14), so no C library build needed for typical users. If spherely isn't installed,manifold="s2"raisesImportErrorwith a clear install hint; the rest of the module keeps working.Stacks on #70
This PR targets
mainbecause GitHub doesn't support cross-fork PRs with non-mainbases, but the real delta vs #70 is small. Useful views:— 5 files changed, ~550 additions, ~100 deletions
main: this PRDesign notes
_Gridgets an optionals2_polys: np.ndarray | Nonefield. When both source and target carrys2_polys,_build_intersection_areasroutes throughspherely.intersection+spherely.area(both vectorized ufuncs) instead of planar shapely. The shapely STRtree on planar bboxes remains as the candidate-pair filter — necessary because spherely has no spatial index yet (tracked upstream at benbovy/spherely#72). Planar bboxes are a lossless over-estimate of spherical overlap, so the coarse filter is safe.Manifold = Literal["planar", "cea", "s2"]type alias on public signatures; runtime_check_manifoldretained for defense on deserialization paths (e.g. loading a netCDF file with a corrupted attribute)._s2_cell_polyscheckshasattr(spherely, "polygons")and uses the batched ufunc when available. Upstream PR benbovy/spherely#52 (draft on my fork) adds this — measured ~2× speedup on construction, not blocking.spherical: boolbecamemanifold: stronRegridderMetadata). Old files raise a clear schema-mismatch error.Optional dependency layout
The
sphericalextra builds on top ofconservative_2d:Users can install either independently:
pip install xarray-regrid[conservative_2d]— 2D regrid without s2pip install xarray-regrid[spherical]— full s2 support (transitively pulls the 2d stack)Performance
s2 grid construction dominates one-time build cost. Measurements on
benbovy/spherelyvectorized-polygonsbranch (not yet released):polygons())Smaller than my original 10-50× forecast because the
s2geometrypolygon construction itself is the floor; the vectorized ufunc only eliminates pybind11 per-call overhead. Details indocs/spherical_performance_plan.md(not included in this PR — in the repo as uncommitted scratch).Apply-path cost is negligible; spherical weight matrix is ~ same size as planar.
Test plan
manifold="s2"on a constant field roundtrips to the constant at machine precision.out · a_dst = A · sto 1e-12 on a random field with the s2 area matrix.oriented=Trues2 path picks the correct cap orientation — regression-tested.Manifoldenum rejection:manifold="bogus"raisesValueErrorat construction and again in_from_statefor loaded files.manifold="s2"without spherely raisesImportErrorwith thepip install xarray-regrid[spherical]hint.manifoldchoice;manifold="s2"survives roundtrip.Docs
Fourth demo notebook
docs/notebooks/demos/demo_conservative_2d_spherical.ipynb, structured to match the other three in theconservative_2dfamily:cos²(lat)source on a 1° grid8π/3)s2 − ceadiff map on a coarse grid (where great-circle vs straight-line edges diverge most).to_netcdf/.from_netcdfroundtrip preserving the manifold choiceFollow-ups (out of scope)
oriented=Trueorientation is unambiguous), but the shapely planar bbox used for candidate filtering doesn't. A small lon-wrap fix would plug the gap; not critical for typical global grids.spherely.polygons(...)constructor. Draft branch exists; would give ~2× on build.🤖 Generated with Claude Code