Hacker News new | ask | show | jobs
by mtrazzi 2754 days ago
Both the Keras one and the one from spinning up can be called "vanilla policy gradients". The one in Keras is closer to REINFORCE, and the one from spinning up use actor-critic and a multilayer perceptron.