| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by msackmann 1155 days ago

Interesting paper, thanks for bringing this up! I have been working on methods for trajectory optimization using both, analytic gradient computations and black box stochastic gradient approximations (proximal policy optimization).

I was always wondering about a question that is touched in the paper: despite the analytic gradient computation being intuitively more efficient and mathematically correct, it is much harder to learn a policy with it than with the “brute force trial-and-error” black box methods.

This paper brings many new perspectives on why.