Hacker News new | ask | show | jobs
Simple GRPO – RL for 8B models on $10/h GPUs (github.com)
1 points by minosu 495 days ago