| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lsorber 396 days ago
	For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pure NumPy: https://github.com/superlinear-ai/microGRPO The implementation learns to play Battleship in about 2000 steps, pretty neat!