Benchmarking inference layer#33
Conversation
Visualization layer into benchmarking layer
|
@ram-cherukuri can you please review the readme? |
Updated the README to clarify the purpose of the workflow and its usage.
Updated section headers and improved clarity of config descriptions.
Updated README to streamline configuration instructions and extend workflow customization guidelines.
peterdsharpe
left a comment
There was a problem hiding this comment.
Quick second pass after your fixup commits.
Big picture is pretty good. Most of the original 125 line-by-line items are addressed, the architectural concerns from the first review are still resolved via the centralized helpers (inference_seed, metric_exceptions, cuda_bf16_autocast, visual_filenames, natural_sort), and the two recent commits (a2f0861 "guard volume canonical mapping" and 03ed7dc "narrow scaling pickle fallback") match the previous review's asks closely. The _custom/_hf YAML split and the nim_inference extraction are nice reorganizations.
Three things from this round that I'd flag specifically:
-
One regression: commit
f86b34cremoved thewith cuda_bf16_autocast(self._device):wrapper fromTransolverWrapper.predictwhile leaving the import in place (now flagged as unused by ruff). FIGNet, XMGN, and DoMINO still use bf16 on CUDA; Transolver and GeoTransolver now silently run fp32 instead. If this was intentional for correctness, please call it out in the CHANGELOG since earlier benchmark numbers from this PR were generated with bf16 active on Transolver. If it was an accident during the "transolver bug" cleanup, restoring the wrap puts Transolver back in line with the others. -
Tooling suggestion: a handful of items I flagged inline this round (the duplicate test functions, the dead
loaded_epoch/os/re, anin_channelsparameter accepted but never read onFIGConvUNetDrivAerML) would all be caught byruff check --select Fin milliseconds. Wiringruffinto the pre-commit / CI lint stage would surface this class of thing automatically and free up review for the more interesting cases. -
README / YAML drift: a couple of header-vs-body and README-vs-YAML mismatches. The most consequential is the README claim at line 247 that
log_envisfalse"in all example YAMLs", which is still wrong becauseconfig_matrix_volume_hf.yaml:59shipslog_env: true. The volume_hfYAML also has the same header-vs-bodyreports.enabledmismatch I noted on the surface_hfYAML. And the READMEmetrics:example block now claimscontinuity_residual_l2/momentum_residual_l2are in the volume YAMLs; neither YAML actually has them in this round, so the example doesn't reproduce against either file.
For the 3 threads I left unresolved on the PR after the previous fixup:
- DoMINO
global_paramsindexing - thanks for engaging on that one; happy to keep it open while the fix is in flight. - Transolver
_datapipe_resolutiondead knob - the assignment was actually removed inf86b34c, so that thread can be re-resolved. - Config dataset
/lustre/...root - theconfig_surface.yamlremoval didn't quite cover this;config_matrix_surface_hf.yaml:58still hasroot: "/lustre/fsw/portfolios/coreai/users/ktangsali/drivaerml"fordatasets[0]. Worth a portable placeholder there too.
Patterns worth flagging at the design level:
- Documentation drift is the recurring theme this round. README headers, YAML headers, and YAML bodies disagree with each other in 4-5 places across the recent commits. Each instance is small but the cumulative effect is reader confusion; a one-time alignment sweep before merge would close all of them at once.
- Cross-wrapper inconsistencies persist. Even after the centralized helpers, GeoTransolver and Transolver now both lack the autocast wrap; XMGN has a
bool(string)pattern that the same codebase fixed foralign_ground_truth_to_modelvia_parse_bool; DoMINO has a bareexcept Exceptionhiding YAML errors. The structural fixes are good; the per-wrapper code paths could use a final consistency pass.
Same bottom line as the first review: PR is materially closer to merge-ready. The remaining work is concentrated and mostly mechanical (1-2 line fixes), with the Transolver autocast item being the one that needs an explicit decision rather than just a code change.
peterdsharpe
left a comment
There was a problem hiding this comment.
We're getting very very close to merging.
2 open comments remain from my side - one new (a regression that was introduce as part of the last commit), and one older one that is not yet fully resolved. Both straightforward to resolve.
Approving; please address these comments with new commits (or clarifying comments) before merging.
Resolved with branch content preferred; tree unchanged from HEAD.
|
/blossom-ci |
|
/blossom-ci |
|
Blossom CI is failing because of warp.device failure. This needs a fix to the upstream (Upstream physicsnemo needs to pin the lower version of Warp here: https://github.com/NVIDIA/physicsnemo/blob/main/pyproject.toml#L25 to 1.11.1 or higher). Since the GitHub CI is passing (correctly installs the latest warp), I am merging this. Will submit a PR to the upstream repo separately. |
PhysicsNeMo-CFD Pull Request
Description
PhysicsNeMo-CFD evaluation adds a config-driven benchmarking pipeline: registered model wrappers run inference on dataset adapters, built-in metrics (shared with physicsnemo.cfd.postprocessing_tools) evaluate predictions vs ground truth, and results flow into tabular reports (JSON/CSV/HTML) plus optional PNG report visuals. workflows/evaluation_examples/ is the primary Hydra workflow (main.py, conf/config_surface.yaml / config_volume.yaml).
Scope of the library
Checklist
Dependencies