post_training_playground Repository with implementation of several post-training algorithms. Primary goal is to experiment with very simple algs on small scale. About: PPO {IN-PROGRESS} DPO {TODO} GRPO {TODO} More to follow.