Skip to content

Exit if dice score is NaN#54

Merged
PatrickRMiles merged 2 commits intoLBANN:mainfrom
michaelmckinsey1:check-nan
Apr 22, 2026
Merged

Exit if dice score is NaN#54
PatrickRMiles merged 2 commits intoLBANN:mainfrom
michaelmckinsey1:check-nan

Conversation

@michaelmckinsey1
Copy link
Copy Markdown
Collaborator

@michaelmckinsey1 michaelmckinsey1 commented Apr 17, 2026

On main, the training will falsely report that it converged if the validation dice score is nan. This state is unrecoverable without a checkpoint, so we can just exit by default.

val_dice_score=nan should be reproducible on main with high LR, e.g. set LR to 0.1.

@PatrickRMiles PatrickRMiles merged commit 91789ab into LBANN:main Apr 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants