|
|
|
|
|
by cubefox
201 days ago
|
|
Usual terminology for the three main learning paradigms: - Supervised learning (e.g. matching labels to pictures) - unsupervised learning / self-supervised learning (pretraining) - reinforcement learning Now the confusing thing is that Dwarkesh Patel instead calls pretraining "supervised learning" and you call reinforcement learning a form of unsupervised learning. |
|
In modern RL, we also train deep nets on some (often non trivial) loss function. And RL is generating its training data. Hence, it blurs the line with SSL. I'd say, however, it's more complex and more computationally expensive. You need many / long rollouts to find a signal to learn from. All of this process is automated. So, from this perspective, it blurs the line with UL too :-) Though it dependence on the reward is what makes the difference.
Overall, going from more structured to less structured, I'd order the learning approaches: SL, SSL (pretraining), RL, UL.