| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by robotresearcher 2079 days ago

> I'm also wondering why the authors didn't publish any experiments to show that it works...

This is a blog post. It cites three of the authors' papers that each contain empirical results. The abstract of the first ends:

"We formally show that this iterated supervised learning procedure optimizes a bound on the RL objective, derive performance bounds of the learned policy, and empirically demonstrate improved goal-reaching performance and robustness over current RL algorithms in several benchmark tasks."

1 comments

klipt 2079 days ago

> empirically demonstrate improved goal-reaching performance and robustness over current RL algorithms

It's interesting that their choice of current algorithms includes PPO but not e.g. Deepmind's Rainbow agent that achieved state of the art performance on many measures: https://arxiv.org/abs/1710.02298

link

tastroder 2078 days ago

They mention Rainbow in the related work section of the third paper listed there, Kumar, A., Peng, X. B., & Levine, S. (2019). Reward-Conditioned Policies. arXiv:1912.13465 as part of this remark: "they are also known to be notoriously challenging to use effectively, due to sensitivity to hyper parameters, high sample complexity, and a range of important and delicate implementation choices that have a large effect on performance [5, 6, 12, 15, 23, 24, 46]."

link