Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70
Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70thodson-usgs wants to merge 8 commits intoxarray-contrib:mainfrom
Conversation
|
Hi Timothy, Thanks for contributing. Nice to see that it was possible to add regridding for non-rectilinear grids. I had thought before of using shapely and intersecting polygons, however I could not get it to perform well enough (didn't know about I do think the LLM went a bit wild and the source file seems a bit overengineered and does not fit the syntax/structuring of the rest of the code. This does not make it very easy to review and see the logic flow. The LLM also thinks that polygons crossing the antimeridian is a "fringe case", I would disagree 😅 |
|
Honestly, I haven't done much yet :), but I thought a draft PR would be a good way to get the ball rolling. The bot also identified a few other small bugfixes and upstream optimizations that I'm working through. As those progress, I'm hoping we can pair on this PR, though I want to be respectful of your time. Cloud friendly regridding has long been on our wishlist at USGS (and other groups), so we were excited when Even if it's never merged, I think this will be a useful datapoint on a longstanding problem. |
Ah, this might be in part because I fed it a Julia implementation for reference (which should be acknowledged). I'll see if the LLM can clean things up, before I make a pass. |
Yeah it's a good starting point and at a basic level it seems to perform well. So as a source of inspiration it seems promising. |
|
Nice, a more generalized conservative regridding approach that bypasses ESMF would be a great addition. Since xESMF is the current way to achieve this in python, adding some direct comparisons there would be nice as we've done with some of the other methods. Broadly agree with @BSchilperoort though. Since this package is very lean and focused as is (only ~2600 lines in src), we should be cautious about dumping in a bunch of code without consideration for maintainability. |
|
Thanks @slevang, |
|
I don't think it is out of scope, and certainly don't mean to discourage you! Only that we should make sure some humans understand the code and agree that it is clean and maintainable before merging. The current focus of this package is basically "tricks for rectilinear grids" since those can be highly optimized. But I don't see why we can't expand to cover optimizations for other regridding problems, as long as it all stays semi-coherent. For example I've worked on a much more efficient version of sparse point-wise interpolation for chunked data that I could probably add as a feature here. |
a2910ce to
613cdcb
Compare
Adds ConservativeRegridder and the .regrid.conservative_2d accessor for grids the existing 1D-factored conservative path can't express: curvilinear (2D lat/lon coords), unstructured meshes, and arbitrary polygon-to-polygon aggregation. Cell intersections go through shapely with sparse-COO weight storage, threaded above ~1k pairs, with optional spherical=True for analytic equal-area weights on lat/lon grids, antimeridian handling, and netCDF save/load for caching weight matrices. Demo notebooks under docs/notebooks/demos/ cover curvilinear, unstructured, and grid->regions aggregation. Optional dependency: install with the conservative-2d extra. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
613cdcb to
3ed200b
Compare
Source on [0, 360] vs target on [-180, 180] (and reverse) yielded NaN on half the cells: a uniform shift can't reconcile the two conventions since the mean diff is exactly 180° and round() banker's-rounds to 0. Per-value wrap source longitudes into the target's 360° window, mirror- ing format_lon, and remap area-matrix columns when wrap breaks source monotonicity. 2D / curvilinear coords keep the existing uniform-shift fallback. Adds regression test covering both directions, planar and spherical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swaps the synthetic polygon demo for xarray's air_temperature tutorial dataset aggregated onto US states dissolved from geodatasets.ncovr. Same workflow end-to-end (from_polygons, map+bar plot, conservation check, save/reload) but with recognizable inputs. Adds a [notebooks] extras group covering the runtime deps the demos need (matplotlib, pooch, geopandas, geodatasets); shapely/h5netcdf/ sparse/dask stay in [conservative-2d] and [accel] so the two can be combined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The map's vmin/vmax span the full source-grid temperature range, but the bar chart was normalizing to the bar values' own min/max — so a state at the cool end was deeper blue on the bar than on the map. Reuse the map's vmin/vmax for the bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each notebook now opens with motivation (what conservative regridding is, when to use it), what makes the case 1D-non-separable (curvilinear coords / arbitrary polygons / unstructured mesh), and a numbered summary of what the notebook will do. Each step's markdown picks up a sentence or two of "why this step" narrative — what's expensive vs. cheap, what to look for in the result, what choices reflect domain conventions — without changing any of the code or expected outputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ooks The unnormalized ``(n_dst, n_src)`` cell-intersection area matrix was stored as ``self._areas`` but read externally by tests and the demo notebooks for conservation diagnostics. Promote to ``self.areas`` as a plain public attribute (consistent with the existing ``self.spherical``, ``self.x_coord``, ``self.y_coord`` style on the same class) and update the four external callers. Internal storage is the same — no behavior or perf change (verified against 3ed200b: build/apply timings within run-to-run noise on rect, curvilinear, and polygon cases). Also runs ``ruff format`` (4-line collapses in conservative_2d.py and test_conservative_2d.py; normalizes notebook source from single-string to list-of-strings) and clears notebook outputs via nbformat-equivalent JSON manipulation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Are you willing to share that code? Even in a rough form? I propose that we prompt a bot to integrate your optimizations into a new branch. Then write a prompt to benchmark the performance against xESMF. Then a final prompt to profile the code to explain any performance gaps. This much should all entail minimal human effort. If the results are interesting, we can take it further. If you prefer not to share the code, I still encourage you to try it. |
Summary
Adds conservative regridding for grids that aren't 1D-separable — a capability the existing
.conservativemethod cannot express:lat[i, j]/lon[i, j]coordinate variables (ORCA, tripolar, rotated regional).ConservativeRegridder.from_polygons(...)accepts arbitrary shapely polygon arrays (MPAS, ICON, finite-element).from_polygonspath. Auxiliary target coordinates (region_id, labels, scalar metadata) ride along with the output.spherical=Truekwarg applies an analytic Lambert cylindrical equal-area projection of cell edges before intersecting. Matches the factored path's sin-weighted accuracy at the same fast-path cost.from_polygons(..., periodic=True)unwraps polygon rings that cross ±180°.Internals:
ConservativeRegridderclass — reusable regridder; builds the sparse area-intersection matrix once, applies to many fields..T/.transpose()returns the backward regridder sharing cached state..to_netcdf/.from_netcdf— persist the weight matrix plus reproducibility metadata (dim names, shapes, grid ranges, spherical flag, xarray-regrid version, timestamp, schema version) in root attributes, with source / target coord-only groups alongside. Schema-versioned for forward compat..regrid.conservative_2d(...)on xarray DataArrays / Datasets.pip install xarray-regrid[conservative-2d]pullsshapely>=2.0+h5netcdf. Existing users unaffected.Motivation
.conservativeuses Stephan Hoyer's axis-factored 1D overlap — fast and elegant but strictly rectilinear (1D-separable). Users with curvilinear ocean models, unstructured climate meshes, or grid-to-region aggregation currently reach for xESMF, which:dask.distributed.conservative_2dfills the gap with pure shapely + sparse. Works on Windows, composes with distributed dask, installs with a singlepip install shapely.Naming follows xESMF's short-method-name convention. "
_2d" distinguishes full 2D cell-polygon intersection from the 1D axis-factored approach and doesn't prejudge whether the grid is curvilinear or unstructured.Design notes
intersectspredicate, by default — much cheaper and the subsequentarea > 0filter drops any false positives.from_polygons(..., predicate_filter=True)restores the predicate filter for pathological inputs (thin diagonals with loose bboxes).shapely.intersectionover candidate pairs is parallelized across aThreadPoolExecutor. ~3-4× on 8 cores.sparse.COO; apply goes throughsparse.matmulinsidexr.apply_ufunc(dask=\"parallelized\"). Cached pre-sorted transpose avoids a per-call re-sort..conservative: two-pass value + mask matmul,nan_thresholdwith the same interpretation.skipnaso uncovered target cells (domain boundaries, polygon holes) always produce NaN._apply_core.Test plan
conservative_2dexactly reproduces.conservative(planar) to 1e-12..Ttranspose on rectilinear and curvilinear; constant field round-trip exact on aligned grids.from_polygons(periodic=True)correctly routes samples across the seam.from_polygonsoutput.test_conservative_2d.pymodule passing on this branch.Docs
Three demo notebooks under
docs/notebooks/demos/:demo_conservative_2d_curvilinear.ipynb— rotated 2D target.demo_conservative_2d_regions.ipynb— structured-grid → region-polygon aggregation.demo_conservative_2d_unstructured.ipynb— Voronoi mesh viafrom_polygons+.to_netcdf/.from_netcdf.Notebooks are output-stripped — reviewers / docs build execute them fresh.
Follow-ups (out of scope for this PR)
spherely-integrationbranch (PR WIP: Add s2 manifold to conservative_2d (stacks on #70) #71) — drops inmanifold=\"s2\"backed by the optionalspherelydependency.from_polygonsbut wants a dedicated API surface for loading region polygons.🤖 Generated with Claude Code