fix(101): align KL divergence calculation with GRPO paper and fix test#576
Open
cyk1337 wants to merge 1 commit into
Open
fix(101): align KL divergence calculation with GRPO paper and fix test#576cyk1337 wants to merge 1 commit into
cyk1337 wants to merge 1 commit into