Y
Hacker News
new
|
ask
|
show
|
jobs
Reinforcement Learning Policy Optimization: Deriving the Policy Gradient Update
(
fanpu.io
)
1 points
by
fanpu
1276 days ago