Hacker News new | ask | show | jobs
Reinforcement Learning Policy Optimization: Deriving the Policy Gradient Update (fanpu.io)
1 points by fanpu 1276 days ago