Hacker News new | ask | show | jobs
by layer8 469 days ago
GRPO = Group Relative Policy Optimization

https://arxiv.org/abs/2402.03300