| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gajomi 2075 days ago
	It seems to me that they are basically describing a variational formulation of the "optimization perspective" of reinforcement learning, which is cool, but I am confused... where is the supervised learning? Like what is the input and what is the output?

2 comments

bnegreve 2075 days ago

The way I understand it, the two subproblems are supervised in the sense that they are trained using data sampled from a fixed distribution, instead of data sampled from a distribution that changes as you update your model, as it is usually the case in RL. This makes the training more stable.

link

jonnycomputer 2075 days ago

Thanks for clarifying that point.

link

Cmmn_Dscndnt 2075 days ago

It seems more as if the authors are abusing terms from Machine Learning like "Supervised Learning".

link

jonnycomputer 2075 days ago

abusing how?

link