| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by msackmann 1155 days ago

My guess: the cart pole is an inverted pendulum, and requires multiple left-right-swing-up movements to bring it from the “hanging” position to the “standing” position. Finding this action sequence using gradients of “where is the tip” vs “where should it be” is very hard, as swinging the pendulum to the left and right goes against the gradient.

Instead, using stochastic gradient approximations (policy gradient method such as proximal policy optimization) might be better suited to solving these kinds of problems. Effectively, they do not compute the exact gradient locally, but rather kind of a global approximation by trying out random sequences of actions and determining which of them are closest to the desired outcome.

Hence, stochastic gradient approximations might be considered some kind of hybrid between greedy local optimization (such as following the exact gradient) and global optimization.