MRB-648 Scorecards for evalml#145
Merged
Merged
Conversation
Member
jonasbhend
requested changes
May 7, 2026
Contributor
jonasbhend
left a comment
There was a problem hiding this comment.
Very nice. I really like the scorecards. Great work.
For future PRs, could you please add a short description of the changes (high-level overview) and - if necessary - also of the goals of the PR? That would be very helpful for the review.
As an additional suggestion, could we include the scorecard in the dashboard (I know we don't always want to produce it, but in case it is available it would be nice to include in a separate tab)?
…ividers, figure height); fix legend centering
jonasbhend
reviewed
May 12, 2026
jonasbhend
approved these changes
May 26, 2026
Contributor
jonasbhend
left a comment
There was a problem hiding this comment.
i have just left two minor comments. Seems all good to me otherwise.
adestefani8
commented
May 27, 2026
adestefani8
commented
May 27, 2026
adestefani8
commented
May 28, 2026
Member
|
let's merge this, thank you all for all the precious feedback and comments! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What this PR adds
This PR adds a new
report_scorecardrule that renders a PNG comparing one run against one baseline.The scorecard has:
(variable × metric)Each cell encodes the row's metric as a relative difference between the two runs for that region and lead time:
(model − baseline) / |baseline| × 100.Markers:
|diff|below the neutral threshold (default 5%)x→ missing or non-finite valueScores:
RMSE,MAE,STDE,R2,ETS,POD,FARRMSE,MAE,STDE, andFARare lower-is-better;R2,ETS, andPODare higher-is-better.Above the neutral threshold, dot area scales linearly with
|diff|%and caps atsize_cap_pct(default 30%).Configuration
Configurable via
paramson the rule:lead_times:"start/stop/step"in hours.regions: regions to include as column blocks. If no region is specified, all regions are included.variables:"VAR:M1,M2,..."entries. Omit:M1,M2,...to useall_metricsfor that variable.Metric names can also expand by prefix: requesting
ETSincludes all matching categorical scores, such asETS_gt_0p0,ETS_gt_0p001,ETS_gt_0p005.If no variable is specified, the script falls back to RMSE only for a default set of variables.
Other defaults (
season,init_hour, metric settings, plot styling) live in the script'scfg.Plot layout
The plotting script makes a few automatic layout decisions:
col_widthgrows when necessary to prevent region header overlap, and the top margin/vertical separators adapt to the rendered header heightTODOs