Migrate to earthkit v1.0 release candidate#162
Conversation
|
Note: we might be able to get rid of the |
|
Tested |
|
Still working on the |
Resolve conflicts: - inference_extract_requirements.py: keep branch's newer eccodes pins (eccodes>=2.44.0,<2.48.0 / eccodes-cosmo-resources-python==2.44.0.1). - plot_forecast_frame.py / plot_meteogram.py: main refactored these from marimo notebooks (.mo.py) into plain scripts with new CLI (regions_json/ stations/outdir). Took main's structure and re-applied the branch's earthkit-v1 data-model renames: step (not lead_time), valid_time (not time) for forecasts/baselines, latitude/longitude (not lat/lon) station coords, plus the grib-glob workaround for forecast frames.
|
|
jonasbhend
left a comment
There was a problem hiding this comment.
Thanks @frazane for putting this together. Looking good. I have a few comments and need to run some examples to make sure everything works. Will approve once I have checked these.
| config: resources/inference/configs/sgm-multidataset-forecaster-global-ich1-oper.yaml | ||
| extra_requirements: | ||
| - git+https://github.com/ecmwf/anemoi-inference.git@b9aaee5df86614cad9d8d08b76876a4be4e980db | ||
| - git+https://github.com/ecmwf/anemoi-inference.git@main |
There was a problem hiding this comment.
Is this intended, that instead of pinning, we use the head of main with all the known issues (i.e. it will break with potentially every update of anemoi-inference)?
| sdor: {"param": 160, "shortName": "sdor"} | ||
| TOT_PREC: {"param": 228, "shortName": "tp"} | ||
| tp: {"param": 228, "shortName": "tp"} | ||
| z: {"param": 129, "shortName": "z"} |
There was a problem hiding this comment.
is there no COSMO name for z (H_SURF) or are we not using this?
There was a problem hiding this comment.
For some parameters the patch does not require to contain both IFS and COSMO names.
| sdor: {"param": 160, "shortName": "sdor"} | ||
| TOT_PREC: {"param": 228, "shortName": "tp"} | ||
| tp: {"param": 228, "shortName": "tp"} | ||
| z: {"param": 129, "shortName": "z"} |
There was a problem hiding this comment.
As a general comment, isn't there a way to reuse parts of a config, such that we wouldn't need to repeat the modifiers section in each of the configs?
There was a problem hiding this comment.
Unfortunately there is not, but it could be reimplemented in anemoi-inference
| lons = ds["longitude"].values.flatten() | ||
| state["forecast_reference_time"] = datetime.fromtimestamp( | ||
| ds["forecast_reference_time"].values.item() / 1e9 | ||
| ) | ||
| state["valid_time"] = datetime.fromtimestamp(ds["valid_time"].values.item() / 1e9) | ||
| state["longitudes"] = ds["longitude"].values.flatten() | ||
| state["latitudes"] = ds["latitude"].values.flatten() | ||
| # Add the limited-area model envelope polygon (convex hull) before global coords are added | ||
| lam_hull = MultiPoint(list(zip(lons.tolist(), lats.tolist()))).convex_hull | ||
| lam_hull = MultiPoint( | ||
| list(zip(lons.tolist(), state["latitudes"].tolist())) | ||
| ).convex_hull |
There was a problem hiding this comment.
| lons = ds["longitude"].values.flatten() | |
| state["forecast_reference_time"] = datetime.fromtimestamp( | |
| ds["forecast_reference_time"].values.item() / 1e9 | |
| ) | |
| state["valid_time"] = datetime.fromtimestamp(ds["valid_time"].values.item() / 1e9) | |
| state["longitudes"] = ds["longitude"].values.flatten() | |
| state["latitudes"] = ds["latitude"].values.flatten() | |
| # Add the limited-area model envelope polygon (convex hull) before global coords are added | |
| lam_hull = MultiPoint(list(zip(lons.tolist(), lats.tolist()))).convex_hull | |
| lam_hull = MultiPoint( | |
| list(zip(lons.tolist(), state["latitudes"].tolist())) | |
| ).convex_hull | |
| state["forecast_reference_time"] = datetime.fromtimestamp( | |
| ds["forecast_reference_time"].values.item() / 1e9 | |
| ) | |
| state["valid_time"] = datetime.fromtimestamp(ds["valid_time"].values.item() / 1e9) | |
| state["longitudes"] = ds["longitude"].values.flatten() | |
| state["latitudes"] = ds["latitude"].values.flatten() | |
| # Add the limited-area model envelope polygon (convex hull) before global coords are added | |
| lam_hull = MultiPoint( | |
| list(zip(state["longitudes"].tolist(), state["latitudes"].tolist())) | |
| ).convex_hull |
| mask = ~np.isnan(ds[_paramlist_ecmwf[0]].values.squeeze()) | ||
| global_lons = ds["longitude"].values.flatten() | ||
| if np.max(global_lons) > 180: | ||
| global_lons = ((global_lons + 180) % 360) - 180 | ||
| state["longitudes"] = np.concatenate([state["longitudes"], global_lons[mask]]) | ||
| state["latitudes"] = np.concatenate( | ||
| [state["latitudes"], ds["latitude"].values.flatten()[mask]] | ||
| ) | ||
| for param in _paramlist_ecmwf: | ||
| if param in ds: | ||
| state["fields"][PARAMS_MAP_INV[param]] = np.concatenate( | ||
| [ | ||
| state["fields"][PARAMS_MAP_INV[param]], | ||
| ds[param].values.flatten()[mask], | ||
| ] |
There was a problem hiding this comment.
If I understand correctly, this is 'expanding' the global fields to avoid seams in the plots, right? Could we maybe factor this out in a separate function to clarify what this is for?
There was a problem hiding this comment.
oh, I didn't quite catch all the lines I guess....
There was a problem hiding this comment.
This code is reproducing what the cutout operation from anemoi does: concatenates regional data with global data, where global data has a region of missing values (the mask) that is replaced by the regional data. Since this code will likely disappears in the future refactoring, would it be okay to leave it as is for now?
| # Load grib once — shared across all region plots | ||
| grib_file = grib_dir / f"{init_time}_{lead_time}.grib" | ||
| # TODO: fix file pattern & globbing | ||
| grib_file = Path(list(grib_dir.glob(f"2*_{lead_time}.grib"))[0]) |
There was a problem hiding this comment.
wait, isn't lead_time here now a varying length number (e.g. 1, 10, 100)? I don't think that would work for all our forecasts, right?
| # save results to NetCDF | ||
| args.output.parent.mkdir(parents=True, exist_ok=True) | ||
| results.to_netcdf(args.output) | ||
| results.earthkit.to_netcdf(args.output) |
There was a problem hiding this comment.
why are we using earthkit here? I guess the results from verify is still just a plain xarray dataset, no?
There was a problem hiding this comment.
Because xarray objects created with earthkit contain non-serializable attributes. This automatically removes them.
|
Ok the first issue I stumble upon is that we now have conflicting pins and requirements in the inference environment: (using interpolators-ich1.yaml from the branch)... I guess we need to adjust the example configs accordingly @frazane, right? |
Ah yes I didn't see this, it got into the branch during the merge with main 4a211c2. We don't need those pins anymore. |
Co-authored-by: Jonas Bhend <[email protected]>
Co-authored-by: Jonas Bhend <[email protected]>
Migrates evalml to the first earthkit v1.0 release candidate. This meant moving data loading off meteodata-lab and onto earthkit-data, adapting to a GRIB encoding change in the new eccodes, and aligning coordinate and dimension names with what the new stack produces.
What changed
Notes