|
|
|
|
|
by robotresearcher
2079 days ago
|
|
> I'm also wondering why the authors didn't publish any experiments to show that it works... This is a blog post. It cites three of the authors' papers that each contain empirical results. The abstract of the first ends: "We formally show that this iterated supervised learning procedure optimizes a bound on the RL objective, derive performance bounds of the learned policy, and empirically demonstrate improved goal-reaching performance and robustness over current RL algorithms in several benchmark tasks." |
|
It's interesting that their choice of current algorithms includes PPO but not e.g. Deepmind's Rainbow agent that achieved state of the art performance on many measures: https://arxiv.org/abs/1710.02298