Hacker News new | ask | show | jobs
by lsorber 349 days ago
For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pure NumPy: https://github.com/superlinear-ai/microGRPO

The implementation learns to play Battleship in about 2000 steps, pretty neat!