MRB-648 Scorecards for evalml by adestefani8 · Pull Request #145 · MeteoSwiss/evalml

adestefani8 · 2026-05-01T09:48:14Z

What this PR adds

This PR adds a new report_scorecard rule that renders a PNG comparing one run against one baseline.

The scorecard has:

one row per (variable × metric)
one column block per region
one column per lead time inside each region block

Each cell encodes the row's metric as a relative difference between the two runs for that region and lead time: (model − baseline) / |baseline| × 100.

Markers:

blue → model better
red → baseline better
grey → |diff| below the neutral threshold (default 5%)
grey x → missing or non-finite value

Scores:

Supported scores: RMSE, MAE, STDE, R2, ETS, POD, FAR
Score direction: RMSE, MAE, STDE, and FAR are lower-is-better; R2, ETS, and POD are higher-is-better.

Above the neutral threshold, dot area scales linearly with |diff|% and caps at size_cap_pct (default 30%).

Configuration

Configurable via params on the rule:

lead_times: "start/stop/step" in hours.
regions: regions to include as column blocks. If no region is specified, all regions are included.
variables: "VAR:M1,M2,..." entries. Omit :M1,M2,... to use all_metrics for that variable.
Metric names can also expand by prefix: requesting ETS includes all matching categorical scores, such as ETS_gt_0p0, ETS_gt_0p001, ETS_gt_0p005.
If no variable is specified, the script falls back to RMSE only for a default set of variables.

Other defaults (season, init_hour, metric settings, plot styling) live in the script's cfg.

Plot layout

The plotting script makes a few automatic layout decisions:

the longest region label is measured before rendering: col_width grows when necessary to prevent region header overlap, and the top margin/vertical separators adapt to the rendered header height
the longest metric label is measured before rendering: variable labels keep a fixed gap from metric labels, and horizontal group separators start from the measured metric-label area
the legend is centered on the scorecard area
the no-data legend entry only appears when missing values are present

TODOs

Expose main scorecard parameters in the evalml config
merge Add ExperimentConfig for experiment workflow outputs #160

dnerini · 2026-05-06T16:00:55Z

looking good :)

jonasbhend

Very nice. I really like the scorecards. Great work.

For future PRs, could you please add a short description of the changes (high-level overview) and - if necessary - also of the goals of the PR? That would be very helpful for the review.

As an additional suggestion, could we include the scorecard in the dashboard (I know we don't always want to produce it, but in case it is available it would be nice to include in a separate tab)?

…ividers, figure height); fix legend centering

…pecified

…s-for-evalml

jonasbhend

i have just left two minor comments. Seems all good to me otherwise.

dnerini · 2026-05-28T11:14:07Z

let's merge this, thank you all for all the precious feedback and comments!

adestefani8 and others added 6 commits May 1, 2026 11:42

Add initial draft for scorecard

ca2ddb2

Add verification scorecard (with hardcoded params)

ff44188

Fix Ruff error

20f908f

Apply pre-commit formatting

dd4252e

Add CLI to scorecard script

ebc9d89

Drop config validation and fix legend

96211bf

dnerini marked this pull request as ready for review May 6, 2026 11:23

dnerini requested review from dnerini and teobuz May 6, 2026 11:24

Merge branch 'main' into MRB-648-Scorecards-for-evalml

4a05a2f

dnerini requested review from frazane and jonasbhend May 6, 2026 15:56

dnerini requested review from Louis-Frey May 6, 2026 16:01

jonasbhend requested changes May 7, 2026

View reviewed changes

Comment thread workflow/rules/report.smk Outdated

Comment thread workflow/Snakefile Outdated

Comment thread workflow/rules/report.smk Outdated

adestefani8 and others added 2 commits May 11, 2026 13:37

Add support for ETS, POD, FAR and edit default scores

7449be3

Make scorecard layout self-adjusting to label sizes (group spacing, d…

87f4033

…ividers, figure height); fix legend centering

jonasbhend reviewed May 12, 2026

View reviewed changes

Comment thread workflow/scripts/report_scorecard.mo.py Outdated

Comment thread workflow/scripts/report_scorecard.mo.py Outdated

adestefani8 and others added 9 commits May 18, 2026 16:12

Drop CORR support and update fallback when no variable or region is s…

e21b476

…pecified

Merge branch 'main' into MRB-648-Scorecards-for-evalml

29f4557

Fix name to DomainConfig

437876b

Fix name to DomainConfig

0b6f3d2

Add ExperimentConfig for experiment workflow outputs

ef0b06e

Update the README

03a5d82

Merge branch 'main' into MRB-648-Scorecards-for-evalml

5d2acdd

Merge branch 'feat/refactor-experiment-config' into MRB-648-Scorecard…

974ef3b

…s-for-evalml

Parametrize scorecards from config files

e9c6752

dnerini self-assigned this May 26, 2026

dnerini requested review from jonasbhend and removed request for dnerini May 26, 2026 07:18

dnerini added 5 commits May 26, 2026 09:19

Add scorecards to config

580a369

Merge branch 'main' into MRB-648-Scorecards-for-evalml

21b605d

Fix linting

949afa4

Update README

32eb198

Add enabled option

8b7703f

jonasbhend approved these changes May 26, 2026

View reviewed changes

Comment thread workflow/Snakefile Outdated

Comment thread workflow/rules/report.smk Outdated

Comment thread workflow/scripts/report_scorecard.mo.py Outdated

Comment thread workflow/Snakefile Outdated

dnerini added 4 commits May 27, 2026 11:46

Make stratification configurable

edd5f81

Drop marimo

fd0bc58

Reformat filenames

0c70591

Update README

b5b9e9c

adestefani8 commented May 27, 2026

View reviewed changes

Comment thread src/evalml/config.py

adestefani8 commented May 27, 2026

View reviewed changes

Comment thread workflow/scripts/report_scorecard.py Outdated

adestefani8 commented May 28, 2026

View reviewed changes

Comment thread workflow/scripts/report_scorecard.py Outdated

dnerini added 5 commits May 28, 2026 12:51

Refactor code to improve readability

83f1e46

Improve description

49f28f5

Remove dead code

12b6f77

Add region=all to subtitle when it's not used for stratification

6a62ea1

Formatting labels

bd9f4f1

dnerini merged commit a288334 into main May 28, 2026
4 of 5 checks passed

dnerini deleted the MRB-648-Scorecards-for-evalml branch May 28, 2026 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRB-648 Scorecards for evalml#145

MRB-648 Scorecards for evalml#145
dnerini merged 32 commits into
mainfrom
MRB-648-Scorecards-for-evalml

adestefani8 commented May 1, 2026 •

edited by dnerini

Loading

Uh oh!

dnerini commented May 6, 2026

Uh oh!

jonasbhend left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonasbhend left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnerini commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adestefani8 commented May 1, 2026 • edited by dnerini Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR adds

Configuration

Plot layout

TODOs

Uh oh!

dnerini commented May 6, 2026

Uh oh!

jonasbhend left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonasbhend left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnerini commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adestefani8 commented May 1, 2026 •

edited by dnerini

Loading