| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bnegreve 2075 days ago
	The way I understand it, the two subproblems are supervised in the sense that they are trained using data sampled from a fixed distribution, instead of data sampled from a distribution that changes as you update your model, as it is usually the case in RL. This makes the training more stable.

1 comments

Thanks for clarifying that point.