Hacker News new | ask | show | jobs
by danijar 2673 days ago
Author here. First of all, I'd like to clarify that the data efficiency gain over D4PG is 5000% or 50x.

Regarding computational efficiency, we match D4PG, a top model-free agent that uses experience replay among other techniques (actor critic, distributional loss, n-step returns, prioritized replay, distributed experience collection).

Your point about exposure bias is interesting, and applies equally to agents that do not learn a model. Personally, I think we need reliable uncertainty estimates in neural networks to make progress on this research question, so the agent can know what it doesn't know.

Hindsight experience replay doesn't apply to tasks where the inputs are images because it requires knowledge of a meaningful goal space with a distance function (e.g. 2D coordinates of goal positions).