Hacker News new | ask | show | jobs
by bnegreve 2075 days ago
The way I understand it, the two subproblems are supervised in the sense that they are trained using data sampled from a fixed distribution, instead of data sampled from a distribution that changes as you update your model, as it is usually the case in RL. This makes the training more stable.
1 comments

Thanks for clarifying that point.