Skip to content

Batch D3 and SevenNetD3Model for Torch-Sim interface#300

Open
alphalm4 wants to merge 17 commits into
MDIL-SNU:mainfrom
alphalm4:d3b
Open

Batch D3 and SevenNetD3Model for Torch-Sim interface#300
alphalm4 wants to merge 17 commits into
MDIL-SNU:mainfrom
alphalm4:d3b

Conversation

@alphalm4

@alphalm4 alphalm4 commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Batch D3 is implemented.

How it works

  • It flatten all input first (escaping pointer-to-pointer-to-pointer...), launch all kernels once, then unflatten
  • Mixed precision within one batch works
  • TorchSim - SevenNetD3Model can use both serial and batch D3, where the default threshold batchsize = 4
    (heuristic, but I found some evidences that current batch D3 is also faster than serial D3 when the system size is smaller than 1k. Also, in very large cell, it slows down than serial D3. So I think better criterion for choosing serial/batch should consider the system size, but ...)

Precision

  • Batch D3's precision policy is FP32+64 (same with serial D3) but it utilizes Kahan summation inside each thread. It greatly diminishes precision-related rounding error in pairwise dispersion energy, approaching the reference FP64 fortran-dftd3 values within (|ΔE,F| < 1e-7, |ΔS| < 3e-7 in ase units)

  • The above difference in summation makes the output energy of batch D3 kernel slightly different with serial D3.

  • Applying Kahan summation also in serial D3 can nearly remove this discrepancy between serial/batch (|ΔE,F,S| < 1e-7 in ase units), but I think this should be addressed in a separate conversation. (since it breaks backward compatibility of calculation results)

  • code review @dambi3613

  • Float64 wrapper for double-precision MD computation flow

  • speed benchmark (especially, check nvalchemiops_dftd3) -> done, found custom batch D3 implementation is still valuable

  • test scripts

  • replace SevenNetD3Model with batched D3 version then remove torchsim_d3.py

  • overflow issues (only applied for batched D3, not serial D3)

  • Doc

@alphalm4

Copy link
Copy Markdown
Contributor Author

Something like this might force Torch-Sim to run in float64 while preserving SevenNet works in float32.

class Float64Wrapper:
    """Wraps a float32 model so torch-sim runs in float64 precision.

    Casts state tensors to float32 before calling the model, then casts
    outputs back to float64. Reports dtype=float64 to torch-sim so all
    integrator arithmetic is done in double precision.
    """

    def __init__(self, model):
        self._model = model
        self._device = model.device
        self._dtype = torch.float64

    @property
    def device(self):
        return self._device

    @property
    def dtype(self):
        return self._dtype

    @property
    def compute_stress(self):
        return self._model.compute_stress

    @property
    def compute_forces(self):
        return self._model.compute_forces

    @property
    def memory_scales_with(self):
        return getattr(self._model, "memory_scales_with", "n_atoms_x_density")

    def __call__(self, state):
        # Cast state to float32 for the model
        state_f32 = state.to(dtype=torch.float32)
        output = self._model(state_f32)
        # Cast outputs back to float64
        return {
            k: v.to(dtype=torch.float64) if isinstance(v, torch.Tensor) else v
            for k, v in output.items()
        }

    def __getattr__(self, name: str):
        return getattr(self._model, name)

@YutackPark

Copy link
Copy Markdown
Member

I dont know the context but why dont we just cast 7net model to float64? For speed?

@alphalm4

alphalm4 commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

7net model을 64로 올리지않는건 speed 때문이 맞습니다. (굳..이?)

맥락을 추가하자면 해당 항목은 TorchSim state precision이 model precision을 따라가기 때문입니다. SevenNet은 float32 만 받기 때문에 현재 기준으로는 강제로 MD도 float32로 돌려야 합니다.
https://github.com/TorchSim/torch-sim/blob/c456ecec0dec1334a13c026dbcf231fa8309c849/torch_sim/runners.py#L297

아주 정밀하게 확인한 건 아니지만, float64 wrapper 구현 후 ensemble invariant (e.g. npt_nose_hoover_invariant) fluctuation을 확인했을 때 float32보다 줄어드는 것으로 보입니다 (즉 계산 오차 감소가 눈에 보입니다). 반드시 그것뿐만이 아니라도, MD precision을 선택할 방법은 있어야 할 것 같습니다.
물론 TorchSim issue에 올려도 되지만 어렵지않게 구현가능해보여서 넣었습니다.

YutackPark
YutackPark previously approved these changes Jun 9, 2026
@alphalm4 alphalm4 marked this pull request as ready for review June 9, 2026 07:19
@alphalm4 alphalm4 requested a review from YutackPark June 9, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants