Migrate to earthkit v1.0 release candidate by frazane · Pull Request #162 · MeteoSwiss/evalml

frazane · 2026-05-26T12:25:12Z

Migrates evalml to the first earthkit v1.0 release candidate. This meant moving data loading off meteodata-lab and onto earthkit-data, adapting to a GRIB encoding change in the new eccodes, and aligning coordinate and dimension names with what the new stack produces.

What changed

Updated dependencies in pyproject.toml and uv.lock: removed meteodata-lab and the old earthkit-plots, pinned the earthkit family (earthkit-data, earthkit-utils, earthkit-plots, earthkit-meteo, earthkit-geo) to release candidate versions, and bumped eccodes and eccodes-cosmo-resources-python. Also moved snakefmt to v2.0.
Reworked GRIB reading in src/data_input on top of earthkit-data, replacing the meteodata-lab decoder. Includes fieldlist-to-xarray conversion driven by an xarray engine profile and de-accumulation handling for TOT_PREC.
Standardized coordinate and dimension names across the codebase: latitude/longitude (previously lat/lon), step (previously lead_time), and valid_time (previously time). This touches data loading, spatial mapping in verification, plotting, the plot scripts, and the related tests.
Adapted plotting to the new earthkit: the GRIB compatibility loader now goes through the shared data_input loader, and unit conversions and styles use earthkit-meteo and earthkit-plots.
Fixed inference configs after a breaking change in the new eccodes: with the COSMO definitions active we can no longer encode both ICON and IFS GRIB files via shortName alone, so a modifiers/patches section now maps each variable to its paramId and shortName in the global inference configs.
Removed legacy regional and trimedge inference configs that are no longer used.
Added a README section on the migration, including a manual workaround to download and cache the eckit geo grid files for the ICON-CH grids, which earthkit cannot fetch automatically yet.

Notes

earthkit v1.0rc is not final, so some rough edges remain. The ICON-CH grid cache step in the README is a manual workaround until the upstream download is fixed.
The grib file globbing in plot_forecast_frame.py is a temporary fix (marked with a TODO) for anemoi-inference writing output filenames with unexpected formatting.
The plotting and data loading code urgently needs some care. In this PR I focused on the minimal changes to make the code work with the new earthkit, but we need to do a larger refactor.

frazane · 2026-05-26T15:56:36Z

Note: we might be able to get rid of the paramId patches in the inference configs. I am working on something here: ecmwf/anemoi-inference@main...feature/use-grib-paramid-encoding. Even if it works it might take some time to merge it because there are likely unwanted side-effects on ECMWF side.

frazane · 2026-05-27T09:41:07Z

Tested evalml experiment ... on all example configs. All green.

frazane · 2026-05-27T16:49:38Z

Still working on the showcase command.

Resolve conflicts: - inference_extract_requirements.py: keep branch's newer eccodes pins (eccodes>=2.44.0,<2.48.0 / eccodes-cosmo-resources-python==2.44.0.1). - plot_forecast_frame.py / plot_meteogram.py: main refactored these from marimo notebooks (.mo.py) into plain scripts with new CLI (regions_json/ stations/outdir). Took main's structure and re-applied the branch's earthkit-v1 data-model renames: step (not lead_time), valid_time (not time) for forecasts/baselines, latitude/longitude (not lat/lon) station coords, plus the grib-glob workaround for forecast frames.

frazane · 2026-05-28T13:00:06Z

showcase command is working

jonasbhend

Thanks @frazane for putting this together. Looking good. I have a few comments and need to run some examples to make sure everything works. Will approve once I have checked these.

jonasbhend · 2026-05-28T12:42:29Z

      config: resources/inference/configs/sgm-multidataset-forecaster-global-ich1-oper.yaml
      extra_requirements:
-        - git+https://github.com/ecmwf/anemoi-inference.git@b9aaee5df86614cad9d8d08b76876a4be4e980db
+        - git+https://github.com/ecmwf/anemoi-inference.git@main


Is this intended, that instead of pinning, we use the head of main with all the known issues (i.e. it will break with potentially every update of anemoi-inference)?

jonasbhend · 2026-05-28T12:44:52Z

+                sdor: {"param": 160, "shortName": "sdor"}
+                TOT_PREC: {"param": 228, "shortName": "tp"}
+                tp: {"param": 228, "shortName": "tp"}
+                z: {"param": 129, "shortName": "z"}


is there no COSMO name for z (H_SURF) or are we not using this?

For some parameters the patch does not require to contain both IFS and COSMO names.

jonasbhend · 2026-05-28T12:47:50Z

+                sdor: {"param": 160, "shortName": "sdor"}
+                TOT_PREC: {"param": 228, "shortName": "tp"}
+                tp: {"param": 228, "shortName": "tp"}
+                z: {"param": 129, "shortName": "z"}


Also here, H_SURF?

jonasbhend · 2026-05-28T15:03:28Z

As a general comment, isn't there a way to reuse parts of a config, such that we wouldn't need to repeat the modifiers section in each of the configs?

Unfortunately there is not, but it could be reimplemented in anemoi-inference

jonasbhend · 2026-05-28T15:11:03Z

+    lons = ds["longitude"].values.flatten()
+    state["forecast_reference_time"] = datetime.fromtimestamp(
+        ds["forecast_reference_time"].values.item() / 1e9
+    )
+    state["valid_time"] = datetime.fromtimestamp(ds["valid_time"].values.item() / 1e9)
+    state["longitudes"] = ds["longitude"].values.flatten()
+    state["latitudes"] = ds["latitude"].values.flatten()
    # Add the limited-area model envelope polygon (convex hull) before global coords are added
-    lam_hull = MultiPoint(list(zip(lons.tolist(), lats.tolist()))).convex_hull
+    lam_hull = MultiPoint(
+        list(zip(lons.tolist(), state["latitudes"].tolist()))
+    ).convex_hull


Suggested change

lons = ds["longitude"].values.flatten()

state["forecast_reference_time"] = datetime.fromtimestamp(

ds["forecast_reference_time"].values.item() / 1e9

)

state["valid_time"] = datetime.fromtimestamp(ds["valid_time"].values.item() / 1e9)

state["longitudes"] = ds["longitude"].values.flatten()

state["latitudes"] = ds["latitude"].values.flatten()

# Add the limited-area model envelope polygon (convex hull) before global coords are added

lam_hull = MultiPoint(list(zip(lons.tolist(), lats.tolist()))).convex_hull

lam_hull = MultiPoint(

list(zip(lons.tolist(), state["latitudes"].tolist()))

).convex_hull

state["forecast_reference_time"] = datetime.fromtimestamp(

ds["forecast_reference_time"].values.item() / 1e9

)

state["valid_time"] = datetime.fromtimestamp(ds["valid_time"].values.item() / 1e9)

state["longitudes"] = ds["longitude"].values.flatten()

state["latitudes"] = ds["latitude"].values.flatten()

# Add the limited-area model envelope polygon (convex hull) before global coords are added

lam_hull = MultiPoint(

list(zip(state["longitudes"].tolist(), state["latitudes"].tolist()))

).convex_hull

jonasbhend · 2026-05-28T15:13:50Z

+        mask = ~np.isnan(ds[_paramlist_ecmwf[0]].values.squeeze())
+        global_lons = ds["longitude"].values.flatten()
+        if np.max(global_lons) > 180:
+            global_lons = ((global_lons + 180) % 360) - 180
+        state["longitudes"] = np.concatenate([state["longitudes"], global_lons[mask]])
+        state["latitudes"] = np.concatenate(
+            [state["latitudes"], ds["latitude"].values.flatten()[mask]]
+        )
+        for param in _paramlist_ecmwf:
+            if param in ds:
+                state["fields"][PARAMS_MAP_INV[param]] = np.concatenate(
+                    [
+                        state["fields"][PARAMS_MAP_INV[param]],
+                        ds[param].values.flatten()[mask],
+                    ]


If I understand correctly, this is 'expanding' the global fields to avoid seams in the plots, right? Could we maybe factor this out in a separate function to clarify what this is for?

oh, I didn't quite catch all the lines I guess....

This code is reproducing what the cutout operation from anemoi does: concatenates regional data with global data, where global data has a region of missing values (the mask) that is replaced by the regional data. Since this code will likely disappears in the future refactoring, would it be okay to leave it as is for now?

jonasbhend · 2026-05-28T15:20:22Z

    # Load grib once — shared across all region plots
-    grib_file = grib_dir / f"{init_time}_{lead_time}.grib"
+    # TODO: fix file pattern & globbing
+    grib_file = Path(list(grib_dir.glob(f"2*_{lead_time}.grib"))[0])


wait, isn't lead_time here now a varying length number (e.g. 1, 10, 100)? I don't think that would work for all our forecasts, right?

jonasbhend · 2026-05-28T15:24:30Z

    # save results to NetCDF
    args.output.parent.mkdir(parents=True, exist_ok=True)
-    results.to_netcdf(args.output)
+    results.earthkit.to_netcdf(args.output)


why are we using earthkit here? I guess the results from verify is still just a plain xarray dataset, no?

Because xarray objects created with earthkit contain non-serializable attributes. This automatically removes them.

jonasbhend · 2026-05-28T15:44:53Z

Ok the first issue I stumble upon is that we now have conflicting pins and requirements in the inference environment:

Using Python 3.12.12 environment at: output/data/runs/interpolator-tmp-569a-on-forecaster-c304-0ee3/.venv
  × No solution found when resolving dependencies:
  ╰─▶ Because you require eccodes==2.39.1 and eccodes>=2.44.0,<2.48.0, we can conclude that your requirements are unsatisfiable.

(using interpolators-ich1.yaml from the branch)... I guess we need to adjust the example configs accordingly @frazane, right?

frazane · 2026-05-28T17:19:30Z

Ok the first issue I stumble upon is that we now have conflicting pins and requirements in the inference environment:
Using Python 3.12.12 environment at: output/data/runs/interpolator-tmp-569a-on-forecaster-c304-0ee3/.venv
  × No solution found when resolving dependencies:
  ╰─▶ Because you require eccodes==2.39.1 and eccodes>=2.44.0,<2.48.0, we can conclude that your requirements are unsatisfiable.
(using interpolators-ich1.yaml from the branch)... I guess we need to adjust the example configs accordingly @frazane, right?

Ah yes I didn't see this, it got into the branch during the merge with main 4a211c2. We don't need those pins anymore.

Co-authored-by: Jonas Bhend <[email protected]>

frazane added 10 commits May 26, 2026 14:07

chore: run snakefmt

0c71de5

update main project dependencies

4475d1f

update dependencies for inference environment

b440427

renaming dimensions

d2c4f2a

refactor source files for new earthkit

6107544

add earthkit workaround instructions

f6c0fce

wip: update inference configs

b7e547b

remove unused inference configs

f4a972d

update configs with patches

563d695

Merge branch 'main' into feat/earthkit-v1-migration

45e562b

frazane marked this pull request as ready for review May 26, 2026 15:55

frazane added 3 commits May 26, 2026 18:03

standardize lat/lon coordinate names to latitude/longitude

ad54000

rename lead time dimension

9663648

update config

f3d033e

frazane requested review from clairemerker and dnerini May 27, 2026 09:43

frazane added 3 commits May 28, 2026 09:35

bugfix, use grib store

b5014ed

update showcase code

b0d0818

fix colormap tests for list-of-colors loader contract

d894d86

frazane requested a review from jonasbhend May 28, 2026 08:56

frazane mentioned this pull request May 28, 2026

Unify regions stations #154

Merged

frazane added 2 commits May 28, 2026 14:05

bugfixes after merge conflicts

d77bead

jonasbhend reviewed May 28, 2026

View reviewed changes

frazane and others added 2 commits May 28, 2026 19:25

fix wrong paramId for dewpoint temperature

d568c40

Co-authored-by: Jonas Bhend <[email protected]>

more fixes to wrong paramId

a186804

Co-authored-by: Jonas Bhend <[email protected]>

Conversation

frazane commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frazane commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frazane commented May 27, 2026

Uh oh!

frazane commented May 27, 2026

Uh oh!

frazane commented May 28, 2026

Uh oh!

jonasbhend left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonasbhend commented May 28, 2026

Uh oh!

frazane commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frazane commented May 26, 2026 •

edited

Loading

frazane commented May 26, 2026 •

edited

Loading