Hacker News new | ask | show | jobs
by gajomi 2075 days ago
It seems to me that they are basically describing a variational formulation of the "optimization perspective" of reinforcement learning, which is cool, but I am confused... where is the supervised learning? Like what is the input and what is the output?
2 comments

The way I understand it, the two subproblems are supervised in the sense that they are trained using data sampled from a fixed distribution, instead of data sampled from a distribution that changes as you update your model, as it is usually the case in RL. This makes the training more stable.
Thanks for clarifying that point.
It seems more as if the authors are abusing terms from Machine Learning like "Supervised Learning".
abusing how?