Y
Hacker News
new
|
ask
|
show
|
jobs
A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE)
(
github.com
)
1 points
by
starzmustdie
160 days ago