Skip to content

Integration of MEC workflow#110

Open
andreaspauling wants to merge 88 commits into
mainfrom
MRB-534-Implement-rule-to-generate-namelist
Open

Integration of MEC workflow#110
andreaspauling wants to merge 88 commits into
mainfrom
MRB-534-Implement-rule-to-generate-namelist

Conversation

@andreaspauling
Copy link
Copy Markdown

@andreaspauling andreaspauling commented Feb 12, 2026

Add the MEC workflow. The new parts are in green in the DAG: snakemake_dag.pdf

For each valid date a MEC case is set up and run. This includes:

  • creating the directory structure
  • adding the observations
  • organizing the model input including past runs depending on the config
  • rendering the MEC namelist
  • executing MEC for all dates with complete data for all leadtimes (excludes the first ones of the period)
  • storing the final feedback file in a separate place.

All MEC cases can be removed once the final feedback file is produced (removal not yet implemented).

  • Topics already raised by Francesco:
    • put folder mec/ in data/mec in order not to mix up init and valid time (MEC is valid time oriented)
    • check globbing options in MEC namelist with DWD (not documented, only FCR_TIME is supported afaik, * etc not). The aim is to avoid copying data.

dnerini and others added 30 commits October 7, 2025 14:01
* Distinguish between primary runs ('candidates') and secondary runs

* Docstrings
* Adopt forecast intervals including the end point

* Fix parsing

* Experiments work

* Update config/forecasters.yaml

* Align init times to availabiliy of COE

* run pre-commit

* Change README to COSMO-E availability

---------

Co-authored-by: Jonas Bhend <jonasbhend@users.noreply.github.com>
Co-authored-by: Jonas Bhend <jonas.bhend@meteoswiss.ch>
* draft changes

* rename workspace resources dir

* working for config/forecasters.yaml

* improve logging

* works for interpolators.yaml

* re-add get_leadtime function

* refactor run directives into script
* add region averages

* add regions to config

* Add regions to verification module, scripts, and rules

* add stratification to forecaster config and fix typo

* fix dict indexing

* fix append error

* read lon/lat from obs dataset

* Add inner verification domain

* Add missing dependency

* add plots by region

* Add regions to dashboard

* Fix dashboard

* Add region name and initializations to plot title (and remove header div)

* Add support for multiple regions

* Fix legend
@andreaspauling
Copy link
Copy Markdown
Author

  • FFV2 in this PR as well
  • evalml options --mec --ffv2 added (default: no mec/ffv2)
  • support of lists in config
  • mec running outside forecast run directories
  • support ver-files as source for observations
  • Paths moved to config
  • minor fixes / cleanup

@andreaspauling andreaspauling requested review from dnerini and frazane May 21, 2026 08:35
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes to the accumulation logic for total precipitation needed here? If not, I would remove these.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. MEC needs precip accumulated from the beginning of the run

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a global postprocessor defined that is applied to all outputs just above. So the specific postprocessors for accumulation of precipitation introduced here are redundant.

config=Path(OUT_ROOT / "data/runs/{run_id}/{init_time}/config.yaml"),
resources=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/resources"),
grib_out_dir=directory(OUT_ROOT / "data/runs/{run_id}/{init_time}/grib"),
okfile=touch(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Claudes help: The okfile is necessary. inference_execute needs to depend on whichever of the two prepare rules ran, but it can't reference them directly by output path because both produce the same three outputs (config.yaml, resources/, grib/). The _inference_routing_fn function selects the correct prepare rule by model type — but to do so, it must reference a path that is unique per rule. The okfile provides that.

That okfile is used in _inference_routing_fn . The routing function returns the okfile path of whichever prepare rule ran (forecaster or interpolator), and inference_execute declares it as its input — this is how Snakemake knows to wait for the correct prepare rule to finish before launching inference.

Sounds plausible to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that touch("/some/file") already automatically generates the file when the rule succeeds.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use Snakemake's touch() on line 199 in inference.smk and then remove those three lines from the script in each function - would that adress your point? I could do that and test it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use touch(.../ok-file) in inference.smk instead of touching it in inference_prepare.py. I found no solution that worked. May we leave it with the current working solution or have a look at it together?

from datetime import timedelta


def _parse_steps(steps: str) -> list[int]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a duplicate of

def get_leadtimes(wc):
?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have different input and output. It may be possible to merge but that would need some time and result in a one more complicated function.

Comment thread workflow/rules/verif_obs.smk Outdated
Comment thread workflow/scripts/generate_ffv2_namelist.py Outdated
Comment thread workflow/rules/verif_obs.smk Outdated
Comment thread workflow/rules/verif_obs.smk Outdated
"""


# link_mec_input: create the input_mod dir with symlinks to all fc files from all source inits
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is not creating symlinks, but copies. Didn't we want to avoid this?

Copy link
Copy Markdown
Author

@andreaspauling andreaspauling May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a version with only symlinks. However, this did not work because all fields needed to calculate precipitation must be one file. This is a consequence of the basic way MEC works - it reads the grib files, does all the processing and then reads the next file. The current version now just copies the data that is really needed, reducing the amount of data considerably - in the first version all inference output was copied.

If we want to save disk space we simply could remove the mec directory. This is what could be done once this workflow is consolidated. Then no disk space is used unnecessarily at the end of the workflow. The feedback files are stored separately.

If the grib writing will be in one file - that would solve this as well.

I added a docstring explaining what this rule does.

@andreaspauling andreaspauling requested a review from frazane May 21, 2026 14:37
Copy link
Copy Markdown
Contributor

@jonasbhend jonasbhend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andreaspauling. Thanks for putting this together, it has grown quite big indeed so I only managed to have a broad look. The thing that worries me the most, is that we seem to limit the mec ffv2 pipeline (inadvertently) to evaluations of consecutive initializations spaced at most 6 hours apart (or step hours, I am not quite sure). I.e. we can evaluate initializations every 6, 3, or 1h, but this is prohibitively expensive to do for a full year and we haven't been doing that in the past. I guess this stems from the orthogonal approach in mec/ffv2 verification (based on valid_time) and current evalml (based on initialization time). So my question is, if it would at all be possible to reorganize mec / ffv2 evaluation to organize feedback files by initialization, which would align the different pipelines and be much more flexible for future use.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can integrate the additional mec config in the existing templates instead of having to support an additional template? Or will this break the existing templates when run without the mec / ffv2 flags?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrating the mec/ffv2 part in to an existing config should be possible. It is a design choice I guess. I have a slight preference for a larger number but small configs but I could try integration. Which existing one would then be best suited?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are supporting three forecaster configs (I don't know why we even have these 3 and they seem all a bit outdated). So I would have suggested to integrate it with one (or all) of these. But honestly, now that I look at the config mess, I am not even sure this is the way to go...

@@ -0,0 +1,84 @@

# variables to verify.
varnoContinuous 'T2M,TD2M,RH2M,U10M,V10M,PS,FF,DD,GUST_6h,RR_6h' #,RR_6h,N,N_L,N_M,N_H,RAD_GL_1h
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variables are hardcoded, does this work also with a subset? Should/can this be made configurable?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works also with a subset. Is there a use case for subsets? We may always want to verify all variables we have.
This list depends on the variables in MEC and thus the observation (and model) source. If we decide on the observation source (see https://meteoswiss.atlassian.net/wiki/spaces/MR/pages/1746993334/SDL-42+Observation+sources+for+MEC+in+evalml ) the variables should stay constant

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as you say this depends on the observations AND model data. So far we don't have wind gusts for example, so this shouldn't work. If there is internal homogenization (subsettting) of the requested parameters to the intersect of params available from both forecast and ground truth, this won't break with an 'extensive' set, but I still think it would be clearer if specified accordingly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFV2 just ignores a variable if it is not there. But I agree it would be clearer to specify only those variables we actually have.

# variables to verify.
varnoContinuous 'T2M,TD2M,RH2M,U10M,V10M,PS,FF,DD,GUST_6h,RR_6h' #,RR_6h,N,N_L,N_M,N_H,RAD_GL_1h
pecthresholds list('FF'=list('lower'=c(2),'upper'=c(7))) # hit rates (percent correct forecast) for the forecast to hit the observation within the given limits are calculated.
catthresholds list('T2M'=c(282,292),'FF'=c(2.5,5,10)) # no space allowed between variables
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can reuse the thresholds defined in the main config?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this should go into the config because that will be frequently changed. I will do it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could basically specify the most extensive list of parameters availbale from observations and then be done with it? Is this the case now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a global postprocessor defined that is applied to all outputs just above. So the specific postprocessors for accumulation of precipitation introduced here are redundant.

Comment thread src/evalml/cli.py
Comment thread src/evalml/cli.py
Comment thread src/evalml/cli.py
echo "Copying $src_rel -> $dest"
cp "$src_rel" "$dest"
else
prev_lead3=$(printf "%03d" "$((lead - 6))")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this coming from TOT_PREC_6H or why are we hardcoding six hourly intervals?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is because we use 6h precip at the moment. Should be made more flexible later

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem being that evalml can evaluate n-hourly precip. It is not explicit in evalml so far because it depends on the step of the forecasts. My suggestion would be to add a comment in the code to highlight that this may need changing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I will do it.
I also create a Jira task, so that it gets done more quickly

Comment thread src/evalml/cli.py
)
targets = []
if mec or ffv2:
if mec:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to check here that mec and ffv2 blocks are present in config and fail fast if not.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. I will do it.

Comment thread workflow/scripts/generate_mec_namelist.py
@andreaspauling
Copy link
Copy Markdown
Author

Hi @andreaspauling. Thanks for putting this together, it has grown quite big indeed so I only managed to have a broad look. The thing that worries me the most, is that we seem to limit the mec ffv2 pipeline (inadvertently) to evaluations of consecutive initializations spaced at most 6 hours apart (or step hours, I am not quite sure). I.e. we can evaluate initializations every 6, 3, or 1h, but this is prohibitively expensive to do for a full year and we haven't been doing that in the past. I guess this stems from the orthogonal approach in mec/ffv2 verification (based on valid_time) and current evalml (based on initialization time). So my question is, if it would at all be possible to reorganize mec / ffv2 evaluation to organize feedback files by initialization, which would align the different pipelines and be much more flexible for future use.

Thanks @jonasbhend for the review! There is no limitation from mec/ffv2 regarding the initialization spacing. You may be referring to the hardcoded 6h, which are steps. This is because we used TOT_PREC_6h so far. Absolutely agree that this needs more flexibility. This can also be done without reorganizing the whole mec /ffv2 evaluation because it is well confined in the set up of the mec cases. However, I would do that in a new PR because this one is already too heavy...

@jonasbhend
Copy link
Copy Markdown
Contributor

Hi @andreaspauling. Thanks for putting this together, it has grown quite big indeed so I only managed to have a broad look. The thing that worries me the most, is that we seem to limit the mec ffv2 pipeline (inadvertently) to evaluations of consecutive initializations spaced at most 6 hours apart (or step hours, I am not quite sure). I.e. we can evaluate initializations every 6, 3, or 1h, but this is prohibitively expensive to do for a full year and we haven't been doing that in the past. I guess this stems from the orthogonal approach in mec/ffv2 verification (based on valid_time) and current evalml (based on initialization time). So my question is, if it would at all be possible to reorganize mec / ffv2 evaluation to organize feedback files by initialization, which would align the different pipelines and be much more flexible for future use.

Thanks @jonasbhend for the review! There is no limitation from mec/ffv2 regarding the initialization spacing. You may be referring to the hardcoded 6h, which are steps. This is because we used TOT_PREC_6h so far. Absolutely agree that this needs more flexibility. This can also be done without reorganizing the whole mec /ffv2 evaluation because it is well confined in the set up of the mec cases. However, I would do that in a new PR because this one is already too heavy...

@andreaspauling I don't think this is the case. Due to _reftimes_mec this only works for initializations at least every step (here 6) hours.

The following is the cropped output from running evalml (dry mode) with initializations every 6h, i.e. the follwoing in forecasters-ich1_mec_ffv2.yaml:

dates:
  start: 2025-07-01T00:00
  end: 2025-07-30T00:00
  frequency: 6h

the list of tasks is as expected:

> evalml experiment config/forecasters-ich1_mec_ffv2.yaml -n --ffv2
host: balfrin-ln002
Building DAG of jobs...
Job stats:
job                               count
------------------------------  -------
ffv2_all                              1
generate_ffv2_namelist                1
generate_mec_namelist               113
inference_create_venv                 1
inference_execute                   117
inference_extract_requirements        1
inference_get_checkpoint              1
inference_make_squashfs_image         1
inference_prepare_forecaster        117
link_mec_input                      113
prepare_mec_input                   113
reorganize_ffv2_files                 1
run_ffv2                              1
run_mec                             113
sarus_pull_ffv2                       1
sarus_pull_mec                        1
total                               696

if I run with initializations every few days

dates:
  start: 2025-07-01T00:00
  end: 2025-07-30T00:00
  frequency: 30h

I get

> evalml experiment config/forecasters-ich1_mec_ffv2.yaml -n --ffv2
host: balfrin-ln002
Building DAG of jobs...
Job stats:
job                       count
----------------------  -------
ffv2_all                      1
generate_ffv2_namelist        1
reorganize_ffv2_files         1
run_ffv2                      1
sarus_pull_ffv2               1
total                         5

whereas the task list is the expected when running without the mec/ffv2 flag:

> evalml experiment config/forecasters-ich1_mec_ffv2.yaml -n
host: balfrin-ln002
Building DAG of jobs...
Job stats:
job                                 count
--------------------------------  -------
experiment_all                          1
inference_create_venv                   1
inference_execute                      24
inference_extract_requirements          1
inference_get_checkpoint                1
inference_make_squashfs_image           1
inference_prepare_forecaster           24
report_experiment_dashboard             1
verification_metrics                   24
verification_metrics_aggregation        1
verification_metrics_plot               1
write_summary                           1
total                                  81

inference_resources:
slurm_partition: normal-shared
checkpoint: https://service.meteoswiss.ch/mlstore#/experiments/602/runs/c30490b6ba064e4db03b430f3a2595ad
label: stage_E_icon_1km_cutoff_edges_subgrid_horography
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
label: stage_E_icon_1km_cutoff_edges_subgrid_horography
label: stage_E_icon_1km_cutoff_edges_subgrid_orography

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants