Hacker News new | ask | show | jobs
by msackmann 1155 days ago
Interesting paper, thanks for bringing this up! I have been working on methods for trajectory optimization using both, analytic gradient computations and black box stochastic gradient approximations (proximal policy optimization).

I was always wondering about a question that is touched in the paper: despite the analytic gradient computation being intuitively more efficient and mathematically correct, it is much harder to learn a policy with it than with the “brute force trial-and-error” black box methods.

This paper brings many new perspectives on why.