Y
Hacker News
new
|
ask
|
show
|
jobs
by
mtrazzi
2754 days ago
Both the Keras one and the one from spinning up can be called "vanilla policy gradients". The one in Keras is closer to REINFORCE, and the one from spinning up use actor-critic and a multilayer perceptron.