| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cold_harbor 29 days ago
	what's wild is they accidentally solved it — pretraining IS unsupervised learning at scale, RLHF IS reinforcement learning. they just didnt know the recipe yet

1 comments

jmalicki 29 days ago

pretraining isn't unsupervised, it is self-supervised - meaning it is moderately more scale limited.

link

sigbottle 29 days ago

What would unsupervised mean, would unsupervised be something like alphago playing against itself trillions of times?

Whereas self-supervised, allows learning without explicit annotation of data ; but it doesn't matter if the models already trained on the entire Internet, and it's not like a game where it can come up with effectively new training data for itself?

link

jmalicki 28 days ago

Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision.

Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised.

link

cold_harbor 29 days ago

fair point — OpenAI's original plan literally said "solve unsupervised learning". the self-supervised distinction wasnt really standard til after BERT/GPT popularized it

link

jmalicki 28 days ago

I think it's an extremely important distinction because self supervised learning has real inherent reward signals. Something like clustering does not.

link