Skip to content

Benchmarking inference layer#33

Merged
ktangsali merged 70 commits into
NVIDIA:mainfrom
RishikeshRanade:benchmarking-inference-layer
May 6, 2026
Merged

Benchmarking inference layer#33
ktangsali merged 70 commits into
NVIDIA:mainfrom
RishikeshRanade:benchmarking-inference-layer

Conversation

@RishikeshRanade
Copy link
Copy Markdown
Collaborator

@RishikeshRanade RishikeshRanade commented Apr 6, 2026

PhysicsNeMo-CFD Pull Request

Description

PhysicsNeMo-CFD evaluation adds a config-driven benchmarking pipeline: registered model wrappers run inference on dataset adapters, built-in metrics (shared with physicsnemo.cfd.postprocessing_tools) evaluate predictions vs ground truth, and results flow into tabular reports (JSON/CSV/HTML) plus optional PNG report visuals. workflows/evaluation_examples/ is the primary Hydra workflow (main.py, conf/config_surface.yaml / config_volume.yaml).

Scope of the library

  • Model wrappers with registry-based CFDModel implementations.
  • Dataset adapters with canonical case schema, DrivAerML, Ahmed, extensible via adapter registry
  • Automated benchmarking with run_benchmark driver, per-case metrics, optional metrics cache, matrix mode (models × datasets), report plugins
  • Visualization consistent with original physicsnemo-cfd functionality
  • Multi-GPU functionality to process multiple case ids
  • Caching to restart benchmarking from where it left off

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.

Dependencies

@RishikeshRanade
Copy link
Copy Markdown
Collaborator Author

@ram-cherukuri can you please review the readme?

Comment thread physicsnemo/cfd/evaluation/metrics/mesh_bridge.py
Comment thread workflows/benchmarking_workflow/README.md Outdated
Comment thread pyproject.toml
Comment thread physicsnemo/cfd/evaluation/common/interpolation.py
Comment thread physicsnemo/cfd/evaluation/metrics/builtin/physics.py
ram-cherukuri and others added 3 commits April 9, 2026 17:43
Updated the README to clarify the purpose of the workflow and its usage.
Updated section headers and improved clarity of config descriptions.
Comment thread workflows/benchmarking_workflow/README.md Outdated
Copy link
Copy Markdown
Collaborator

@peterdsharpe peterdsharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick second pass after your fixup commits.


Big picture is pretty good. Most of the original 125 line-by-line items are addressed, the architectural concerns from the first review are still resolved via the centralized helpers (inference_seed, metric_exceptions, cuda_bf16_autocast, visual_filenames, natural_sort), and the two recent commits (a2f0861 "guard volume canonical mapping" and 03ed7dc "narrow scaling pickle fallback") match the previous review's asks closely. The _custom/_hf YAML split and the nim_inference extraction are nice reorganizations.

Three things from this round that I'd flag specifically:

  1. One regression: commit f86b34c removed the with cuda_bf16_autocast(self._device): wrapper from TransolverWrapper.predict while leaving the import in place (now flagged as unused by ruff). FIGNet, XMGN, and DoMINO still use bf16 on CUDA; Transolver and GeoTransolver now silently run fp32 instead. If this was intentional for correctness, please call it out in the CHANGELOG since earlier benchmark numbers from this PR were generated with bf16 active on Transolver. If it was an accident during the "transolver bug" cleanup, restoring the wrap puts Transolver back in line with the others.

  2. Tooling suggestion: a handful of items I flagged inline this round (the duplicate test functions, the dead loaded_epoch / os / re, an in_channels parameter accepted but never read on FIGConvUNetDrivAerML) would all be caught by ruff check --select F in milliseconds. Wiring ruff into the pre-commit / CI lint stage would surface this class of thing automatically and free up review for the more interesting cases.

  3. README / YAML drift: a couple of header-vs-body and README-vs-YAML mismatches. The most consequential is the README claim at line 247 that log_env is false "in all example YAMLs", which is still wrong because config_matrix_volume_hf.yaml:59 ships log_env: true. The volume _hf YAML also has the same header-vs-body reports.enabled mismatch I noted on the surface _hf YAML. And the README metrics: example block now claims continuity_residual_l2 / momentum_residual_l2 are in the volume YAMLs; neither YAML actually has them in this round, so the example doesn't reproduce against either file.


For the 3 threads I left unresolved on the PR after the previous fixup:

  • DoMINO global_params indexing - thanks for engaging on that one; happy to keep it open while the fix is in flight.
  • Transolver _datapipe_resolution dead knob - the assignment was actually removed in f86b34c, so that thread can be re-resolved.
  • Config dataset /lustre/... root - the config_surface.yaml removal didn't quite cover this; config_matrix_surface_hf.yaml:58 still has root: "/lustre/fsw/portfolios/coreai/users/ktangsali/drivaerml" for datasets[0]. Worth a portable placeholder there too.

Patterns worth flagging at the design level:

  • Documentation drift is the recurring theme this round. README headers, YAML headers, and YAML bodies disagree with each other in 4-5 places across the recent commits. Each instance is small but the cumulative effect is reader confusion; a one-time alignment sweep before merge would close all of them at once.
  • Cross-wrapper inconsistencies persist. Even after the centralized helpers, GeoTransolver and Transolver now both lack the autocast wrap; XMGN has a bool(string) pattern that the same codebase fixed for align_ground_truth_to_model via _parse_bool; DoMINO has a bare except Exception hiding YAML errors. The structural fixes are good; the per-wrapper code paths could use a final consistency pass.

Same bottom line as the first review: PR is materially closer to merge-ready. The remaining work is concentrated and mostly mechanical (1-2 line fixes), with the Transolver autocast item being the one that needs an explicit decision rather than just a code change.

Comment thread test/ci_tests/test_evaluation.py Outdated
Comment thread physicsnemo/cfd/evaluation/metrics/mesh_bridge.py
Comment thread workflows/benchmarking/README.md Outdated
Comment thread physicsnemo/cfd/evaluation/models/wrappers/xmgn/wrapper.py Outdated
Comment thread physicsnemo/cfd/evaluation/models/wrappers/transolver/wrapper.py Outdated
Comment thread physicsnemo/cfd/evaluation/models/wrappers/transolver/wrapper.py Outdated
Comment thread physicsnemo/cfd/evaluation/models/wrappers/domino/inference.py Outdated
Comment thread physicsnemo/cfd/evaluation/models/wrappers/fignet/wrapper.py Outdated
Comment thread physicsnemo/cfd/evaluation/models/wrappers/domino/inference.py Outdated
Copy link
Copy Markdown
Collaborator

@peterdsharpe peterdsharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting very very close to merging.

2 open comments remain from my side - one new (a regression that was introduce as part of the last commit), and one older one that is not yet fully resolved. Both straightforward to resolve.

Approving; please address these comments with new commits (or clarifying comments) before merging.

Comment thread workflows/benchmarking/README.md Outdated
@ktangsali
Copy link
Copy Markdown
Collaborator

/blossom-ci

@ktangsali
Copy link
Copy Markdown
Collaborator

/blossom-ci

@ktangsali
Copy link
Copy Markdown
Collaborator

Blossom CI is failing because of warp.device failure. This needs a fix to the upstream (Upstream physicsnemo needs to pin the lower version of Warp here: https://github.com/NVIDIA/physicsnemo/blob/main/pyproject.toml#L25 to 1.11.1 or higher). Since the GitHub CI is passing (correctly installs the latest warp), I am merging this. Will submit a PR to the upstream repo separately.

@ktangsali ktangsali merged commit bb4e2c7 into NVIDIA:main May 6, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants