| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mtrazzi 2754 days ago
	Both the Keras one and the one from spinning up can be called "vanilla policy gradients". The one in Keras is closer to REINFORCE, and the one from spinning up use actor-critic and a multilayer perceptron.