Hacker News new | ask | show | jobs
by danielhanchen 490 days ago
Yep so GRPO is much more memory efficient than PPO, but other RL type algorithms can work fine as well!