This will cause the grad unscaling and grad clipping to be fused with the optimizer step -- should speed things up.
This will cause the grad unscaling and grad clipping to be fused with the optimizer step -- should speed things up.