Y
Hacker News
new
|
ask
|
show
|
jobs
by
danielhanchen
490 days ago
Yep so GRPO is much more memory efficient than PPO, but other RL type algorithms can work fine as well!