| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danielhanchen 490 days ago
	Yep so GRPO is much more memory efficient than PPO, but other RL type algorithms can work fine as well!