Hacker News new | ask | show | jobs
by imjonse 490 days ago
Is it established whether GRPO is essential for this to work as it does, or could other RLHF-class methods provide similar results? My initial (possibly mistaken) impression was that GRPO was one of ways of mitigating the lack of enormous hardware resources.
1 comments

Yep so GRPO is much more memory efficient than PPO, but other RL type algorithms can work fine as well!