Conversation
|
Are you sure amp = addCuQcomp(amp, mulCuQcomp(coeffs[q], inAmps[q][n]));could instead be just amp = amp + mulCuQcomp(coeffs[q], inAmps[q][n]);and I'm not even sure that's better, but it'd help the investigation by shrinking the diff |
Yup, I agree that is better way to do it. I'll update the PR |
|
Min example output on 8 GPUs on Frontier using ROCm 7.2.0: Test output on 1 GPU on Frontier: Looks good to me! @eessmann two things:
I will walk back ROCm versions to see what our backwards compatibility looks like here. |
|
Okay, all tests pass on ROCm 5.7.1, so I lazily declare ROCm back-compat 'fine'. |
|
@eessmann Final thing before I forget -- could you do a performance regression check on one of the NV systems as well? |
Inital hack to get us working on ROCm 7.
Not very happy about the casts and explicit calls to addCuQcomp and mulCuQcomp