Integration of MEC workflow by andreaspauling · Pull Request #110 · MeteoSwiss/evalml

andreaspauling · 2026-02-12T14:42:42Z

Add the MEC workflow. The new parts are in green in the DAG: snakemake_dag.pdf

For each valid date a MEC case is set up and run. This includes:

creating the directory structure
adding the observations
organizing the model input including past runs depending on the config
rendering the MEC namelist
executing MEC for all dates with complete data for all leadtimes (excludes the first ones of the period)
storing the final feedback file in a separate place.

All MEC cases can be removed once the final feedback file is produced (removal not yet implemented).

Topics already raised by Francesco:
- put folder mec/ in data/mec in order not to mix up init and valid time (MEC is valid time oriented)
- check globbing options in MEC namelist with DWD (not documented, only FCR_TIME is supported afaik, * etc not). The aim is to avoid copying data.

… we want to factor it out of the rule

* Distinguish between primary runs ('candidates') and secondary runs * Docstrings

* Adopt forecast intervals including the end point * Fix parsing * Experiments work * Update config/forecasters.yaml * Align init times to availabiliy of COE * run pre-commit * Change README to COSMO-E availability --------- Co-authored-by: Jonas Bhend <jonasbhend@users.noreply.github.com> Co-authored-by: Jonas Bhend <jonas.bhend@meteoswiss.ch>

* draft changes * rename workspace resources dir * working for config/forecasters.yaml * improve logging * works for interpolators.yaml * re-add get_leadtime function * refactor run directives into script

* add region averages * add regions to config * Add regions to verification module, scripts, and rules * add stratification to forecaster config and fix typo * fix dict indexing * fix append error * read lon/lat from obs dataset * Add inner verification domain * Add missing dependency * add plots by region * Add regions to dashboard * Fix dashboard * Add region name and initializations to plot title (and remove header div) * Add support for multiple regions * Fix legend

…e-to-generate-namelist

…ule-to-generate-namelist' into MRB-536-for-review

andreaspauling · 2026-05-21T08:34:57Z

FFV2 in this PR as well
evalml options --mec --ffv2 added (default: no mec/ffv2)
support of lists in config
mec running outside forecast run directories
support ver-files as source for observations
Paths moved to config
minor fixes / cleanup

frazane · 2026-05-21T09:30:07Z

Are these changes to the accumulation logic for total precipitation needed here? If not, I would remove these.

Exactly. MEC needs precip accumulated from the beginning of the run

There is a global postprocessor defined that is applied to all outputs just above. So the specific postprocessors for accumulation of precipitation introduced here are redundant.

frazane · 2026-05-21T09:33:57Z

        config=Path(OUT_ROOT / "data/runs/{run_id}/{init_time}/config.yaml"),
        resources=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/resources"),
        grib_out_dir=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/grib"),
-        okfile=touch(


Why was this change made?

With Claudes help: The okfile is necessary. inference_execute needs to depend on whichever of the two prepare rules ran, but it can't reference them directly by output path because both produce the same three outputs (config.yaml, resources/, grib/). The _inference_routing_fn function selects the correct prepare rule by model type — but to do so, it must reference a path that is unique per rule. The okfile provides that.

That okfile is used in _inference_routing_fn . The routing function returns the okfile path of whichever prepare rule ran (forecaster or interpolator), and inference_execute declares it as its input — this is how Snakemake knows to wait for the correct prepare rule to finish before launching inference.

Sounds plausible to me.

What I mean is that touch("/some/file") already automatically generates the file when the rule succeeds.

I could use Snakemake's touch() on line 199 in inference.smk and then remove those three lines from the script in each function - would that adress your point? I could do that and test it.

I tried to use touch(.../ok-file) in inference.smk instead of touching it in inference_prepare.py. I found no solution that worked. May we leave it with the current working solution or have a look at it together?

frazane · 2026-05-21T09:35:35Z

+from datetime import timedelta
+
+
+def _parse_steps(steps: str) -> list[int]:


Isn't this a duplicate of

evalml/workflow/rules/plot.smk

Line 119 in e4af0a6

def get_leadtimes(wc):

?

They have different input and output. It may be possible to merge but that would need some time and result in a one more complicated function.

frazane · 2026-05-21T10:12:10Z

+        """
+
+
+# link_mec_input: create the input_mod dir with symlinks to all fc files from all source inits


This rule is not creating symlinks, but copies. Didn't we want to avoid this?

I implemented a version with only symlinks. However, this did not work because all fields needed to calculate precipitation must be one file. This is a consequence of the basic way MEC works - it reads the grib files, does all the processing and then reads the next file. The current version now just copies the data that is really needed, reducing the amount of data considerably - in the first version all inference output was copied.

If we want to save disk space we simply could remove the mec directory. This is what could be done once this workflow is consolidated. Then no disk space is used unnecessarily at the end of the workflow. The feedback files are stored separately.

If the grib writing will be in one file - that would solve this as well.

I added a docstring explaining what this rule does.

jonasbhend

Hi @andreaspauling. Thanks for putting this together, it has grown quite big indeed so I only managed to have a broad look. The thing that worries me the most, is that we seem to limit the mec ffv2 pipeline (inadvertently) to evaluations of consecutive initializations spaced at most 6 hours apart (or step hours, I am not quite sure). I.e. we can evaluate initializations every 6, 3, or 1h, but this is prohibitively expensive to do for a full year and we haven't been doing that in the past. I guess this stems from the orthogonal approach in mec/ffv2 verification (based on valid_time) and current evalml (based on initialization time). So my question is, if it would at all be possible to reorganize mec / ffv2 evaluation to organize feedback files by initialization, which would align the different pipelines and be much more flexible for future use.

jonasbhend · 2026-06-01T08:31:01Z

Any chance we can integrate the additional mec config in the existing templates instead of having to support an additional template? Or will this break the existing templates when run without the mec / ffv2 flags?

Integrating the mec/ffv2 part in to an existing config should be possible. It is a design choice I guess. I have a slight preference for a larger number but small configs but I could try integration. Which existing one would then be best suited?

we are supporting three forecaster configs (I don't know why we even have these 3 and they seem all a bit outdated). So I would have suggested to integrate it with one (or all) of these. But honestly, now that I look at the config mess, I am not even sure this is the way to go...

jonasbhend · 2026-06-01T08:33:02Z

@@ -0,0 +1,84 @@
+
+# variables to verify.
+varnoContinuous     'T2M,TD2M,RH2M,U10M,V10M,PS,FF,DD,GUST_6h,RR_6h' #,RR_6h,N,N_L,N_M,N_H,RAD_GL_1h


Variables are hardcoded, does this work also with a subset? Should/can this be made configurable?

It works also with a subset. Is there a use case for subsets? We may always want to verify all variables we have.
This list depends on the variables in MEC and thus the observation (and model) source. If we decide on the observation source (see https://meteoswiss.atlassian.net/wiki/spaces/MR/pages/1746993334/SDL-42+Observation+sources+for+MEC+in+evalml ) the variables should stay constant

Well, as you say this depends on the observations AND model data. So far we don't have wind gusts for example, so this shouldn't work. If there is internal homogenization (subsettting) of the requested parameters to the intersect of params available from both forecast and ground truth, this won't break with an 'extensive' set, but I still think it would be clearer if specified accordingly.

FFV2 just ignores a variable if it is not there. But I agree it would be clearer to specify only those variables we actually have.

jonasbhend · 2026-06-01T08:34:00Z

+# variables to verify.
+varnoContinuous     'T2M,TD2M,RH2M,U10M,V10M,PS,FF,DD,GUST_6h,RR_6h' #,RR_6h,N,N_L,N_M,N_H,RAD_GL_1h
+pecthresholds       list('FF'=list('lower'=c(2),'upper'=c(7))) # hit rates (percent correct forecast) for the forecast to hit the observation within the given limits are calculated.
+catthresholds       list('T2M'=c(282,292),'FF'=c(2.5,5,10)) # no space allowed between variables


Any chance we can reuse the thresholds defined in the main config?

Agree, this should go into the config because that will be frequently changed. I will do it.

So we could basically specify the most extensive list of parameters availbale from observations and then be done with it? Is this the case now?

jonasbhend · 2026-06-01T08:36:32Z

There is a global postprocessor defined that is applied to all outputs just above. So the specific postprocessors for accumulation of precipitation introduced here are redundant.

jonasbhend · 2026-06-01T09:32:10Z

+                        echo "Copying $src_rel -> $dest"
+                        cp "$src_rel" "$dest"
+                    else
+                        prev_lead3=$(printf "%03d" "$((lead - 6))")


Is this coming from TOT_PREC_6H or why are we hardcoding six hourly intervals?

Yes, this is because we use 6h precip at the moment. Should be made more flexible later

The problem being that evalml can evaluate n-hourly precip. It is not explicit in evalml so far because it depends on the step of the forecasts. My suggestion would be to add a comment in the code to highlight that this may need changing.

Good idea. I will do it.
I also create a Jira task, so that it gets done more quickly

jonasbhend · 2026-06-01T09:34:29Z

-    )
+    targets = []
+    if mec or ffv2:
+        if mec:


I suggest to check here that mec and ffv2 blocks are present in config and fail fast if not.

That sounds good to me. I will do it.

andreaspauling · 2026-06-01T12:33:08Z

Hi @andreaspauling. Thanks for putting this together, it has grown quite big indeed so I only managed to have a broad look. The thing that worries me the most, is that we seem to limit the mec ffv2 pipeline (inadvertently) to evaluations of consecutive initializations spaced at most 6 hours apart (or step hours, I am not quite sure). I.e. we can evaluate initializations every 6, 3, or 1h, but this is prohibitively expensive to do for a full year and we haven't been doing that in the past. I guess this stems from the orthogonal approach in mec/ffv2 verification (based on valid_time) and current evalml (based on initialization time). So my question is, if it would at all be possible to reorganize mec / ffv2 evaluation to organize feedback files by initialization, which would align the different pipelines and be much more flexible for future use.

Thanks @jonasbhend for the review! There is no limitation from mec/ffv2 regarding the initialization spacing. You may be referring to the hardcoded 6h, which are steps. This is because we used TOT_PREC_6h so far. Absolutely agree that this needs more flexibility. This can also be done without reorganizing the whole mec /ffv2 evaluation because it is well confined in the set up of the mec cases. However, I would do that in a new PR because this one is already too heavy...

jonasbhend · 2026-06-01T12:48:30Z

Hi @andreaspauling. Thanks for putting this together, it has grown quite big indeed so I only managed to have a broad look. The thing that worries me the most, is that we seem to limit the mec ffv2 pipeline (inadvertently) to evaluations of consecutive initializations spaced at most 6 hours apart (or step hours, I am not quite sure). I.e. we can evaluate initializations every 6, 3, or 1h, but this is prohibitively expensive to do for a full year and we haven't been doing that in the past. I guess this stems from the orthogonal approach in mec/ffv2 verification (based on valid_time) and current evalml (based on initialization time). So my question is, if it would at all be possible to reorganize mec / ffv2 evaluation to organize feedback files by initialization, which would align the different pipelines and be much more flexible for future use.

Thanks @jonasbhend for the review! There is no limitation from mec/ffv2 regarding the initialization spacing. You may be referring to the hardcoded 6h, which are steps. This is because we used TOT_PREC_6h so far. Absolutely agree that this needs more flexibility. This can also be done without reorganizing the whole mec /ffv2 evaluation because it is well confined in the set up of the mec cases. However, I would do that in a new PR because this one is already too heavy...

@andreaspauling I don't think this is the case. Due to _reftimes_mec this only works for initializations at least every step (here 6) hours.

The following is the cropped output from running evalml (dry mode) with initializations every 6h, i.e. the follwoing in forecasters-ich1_mec_ffv2.yaml:

dates:
  start: 2025-07-01T00:00
  end: 2025-07-30T00:00
  frequency: 6h

the list of tasks is as expected:

> evalml experiment config/forecasters-ich1_mec_ffv2.yaml -n --ffv2
host: balfrin-ln002
Building DAG of jobs...
Job stats:
job                               count
------------------------------  -------
ffv2_all                              1
generate_ffv2_namelist                1
generate_mec_namelist               113
inference_create_venv                 1
inference_execute                   117
inference_extract_requirements        1
inference_get_checkpoint              1
inference_make_squashfs_image         1
inference_prepare_forecaster        117
link_mec_input                      113
prepare_mec_input                   113
reorganize_ffv2_files                 1
run_ffv2                              1
run_mec                             113
sarus_pull_ffv2                       1
sarus_pull_mec                        1
total                               696

if I run with initializations every few days

dates:
  start: 2025-07-01T00:00
  end: 2025-07-30T00:00
  frequency: 30h

I get

> evalml experiment config/forecasters-ich1_mec_ffv2.yaml -n --ffv2
host: balfrin-ln002
Building DAG of jobs...
Job stats:
job                       count
----------------------  -------
ffv2_all                      1
generate_ffv2_namelist        1
reorganize_ffv2_files         1
run_ffv2                      1
sarus_pull_ffv2               1
total                         5

whereas the task list is the expected when running without the mec/ffv2 flag:

> evalml experiment config/forecasters-ich1_mec_ffv2.yaml -n
host: balfrin-ln002
Building DAG of jobs...
Job stats:
job                                 count
--------------------------------  -------
experiment_all                          1
inference_create_venv                   1
inference_execute                      24
inference_extract_requirements          1
inference_get_checkpoint                1
inference_make_squashfs_image           1
inference_prepare_forecaster           24
report_experiment_dashboard             1
verification_metrics                   24
verification_metrics_aggregation        1
verification_metrics_plot               1
write_summary                           1
total                                  81

jonasbhend · 2026-06-01T12:54:24Z

+      inference_resources:
+          slurm_partition: normal-shared
+      checkpoint: https://service.meteoswiss.ch/mlstore#/experiments/602/runs/c30490b6ba064e4db03b430f3a2595ad
+      label: stage_E_icon_1km_cutoff_edges_subgrid_horography


Suggested change

label: stage_E_icon_1km_cutoff_edges_subgrid_horography

label: stage_E_icon_1km_cutoff_edges_subgrid_orography

dnerini and others added 30 commits October 7, 2025 14:01

Initial draft (pseudo code)

c1375ab

add namelist as resource

9f608f2

add verif_obs.smk to Snakefile

e82bd94

Add rules for observation data and namelist generation (using fake data)

c3ab651

add newline to namelist template

7512d96

somewhat working version of run_mec (with fake data)

13301a5

correct typo and add optional script for generating namelist, in case…

e722e5f

… we want to factor it out of the rule

fix: add localrule to inference_interpolator rule (#57)

3d9e3c1

Fix for interpolator rule

918913f

Consolidate multi packages into unique src/ dir (#58)

179eb4d

Update configs (#63)

e791a30

Adopt 'steps' instead of 'lead_time' (#62)

d197712

Update example config for experiment with interpolators (#70)

9568987

Distinguish between primary runs ('candidates') and secondary runs (#64)

128eb91

* Distinguish between primary runs ('candidates') and secondary runs * Docstrings

Mrb 550 inconcsistent forecast initializations in evalml (#72)

e028f59

Update vega-lite spec (#69)

5406777

Decouple inference preparation and execution (#68)

8d01490

* draft changes * rename workspace resources dir * working for config/forecasters.yaml * improve logging * works for interpolators.yaml * re-add get_leadtime function * refactor run directives into script

input data and namelist for MEC

04c4cf1

Merge remote-tracking branch 'origin/main' into MRB-534-Implement-rul…

b1959dc

…e-to-generate-namelist

Cleanup

23c9599

Refactor MEC namelist generation

804455a

setup MEC case

f793d85

add use of local MEC executable and cleaning

3839476

Support of mec in a sarus container

5b58b7a

First draft of FFV2 rules

569d713

change some params to fix wildcard issue

e6eb2cc

change name of nl file

ce90890

make note about ver ens member

29ab980

Andreas Pauling and others added 17 commits April 7, 2026 11:54

Merge branch 'main' into MRB-534-Implement-rule-to-generate-namelist

ef5fb82

fixes after merge with main, support for ICON

097c58f

Merge branch 'main' into MRB-534-Implement-rule-to-generate-namelist

5532469

wildcard fixes

331e67b

support precipitation differencing

bdf12f6

Merge branch 'main' into MRB-534-Implement-rule-to-generate-namelist

a572bbe

Merge remote-tracking branch 'refs/remotes/origin/MRB-534-Implement-r…

634e2d7

…ule-to-generate-namelist' into MRB-536-for-review

Attempt to merge from MRB-534-Implement-rule-to-generate-namelist again

2d66b40

cleanup

d3cc2a5

support ver-files as observation source

f3d45c7

add --mec --ffv2 options, paths in config, support of date lists

21d70b4

logging, cleaning

829932c

Run MEC outside the forecast run directory

8ab3393

cleanup

d5afdf2

Remove trailing whitespace

5396f20

ffv2 config update

332bd1e

Merge branch 'main' into MRB-534-Implement-rule-to-generate-namelist

0d65d78

andreaspauling requested review from dnerini and frazane May 21, 2026 08:35

frazane reviewed May 21, 2026

View reviewed changes

andreaspauling added 2 commits May 21, 2026 16:15

updates PR review

4595921

Merge branch 'main' into MRB-534-Implement-rule-to-generate-namelist

1f60ccf

andreaspauling requested a review from frazane May 21, 2026 14:37

andreaspauling added 2 commits May 27, 2026 14:28

Merge branch 'main' into MRB-534-Implement-rule-to-generate-namelist

93d9354

formatting

a3b1aba

jonasbhend reviewed Jun 1, 2026

View reviewed changes

		from datetime import timedelta


		def _parse_steps(steps: str) -> list[int]:

		"""


		# link_mec_input: create the input_mod dir with symlinks to all fc files from all source inits

		@@ -0,0 +1,84 @@

		# variables to verify.
		varnoContinuous 'T2M,TD2M,RH2M,U10M,V10M,PS,FF,DD,GUST_6h,RR_6h' #,RR_6h,N,N_L,N_M,N_H,RAD_GL_1h

	label: stage_E_icon_1km_cutoff_edges_subgrid_horography
	label: stage_E_icon_1km_cutoff_edges_subgrid_orography

Conversation

andreaspauling commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreaspauling commented May 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreaspauling May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonasbhend left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreaspauling commented Jun 1, 2026

Uh oh!

jonasbhend commented Jun 1, 2026

Uh oh!

andreaspauling commented Feb 12, 2026 •

edited

Loading

andreaspauling May 21, 2026 •

edited

Loading