Hacker News new | ask | show | jobs
A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE) (github.com)
1 points by starzmustdie 160 days ago