Y
Hacker News
new
|
ask
|
show
|
jobs
by
lsorber
349 days ago
For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pure NumPy:
https://github.com/superlinear-ai/microGRPO
The implementation learns to play Battleship in about 2000 steps, pretty neat!