| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dongobread 719 days ago
	The knowledge distillation is very interesting but generating trillions of outputs from a large teacher model seems insanely expensive. Is this really more cost efficient than just using that compute instead for training your model with more data/more epochs?

2 comments

DebtDeflation 719 days ago

I'm also curious. It seems like 6 months ago everyone was afraid of "model collapse" but now synthetic training generation and teacher models are all the rage. Have we solved the problem of model collapse?

link

astrange 719 days ago

Model collapse was basically a coping idea made up by artists who were hoping AI image generators would all magically destroy themselves at some point; I don't think it was ever considered likely to happen.

It does seem to be true that clean data works better than low quality data.

link

groby_b 719 days ago

You're confusing it with data poisoning.

Model collapse itself is(was?) a fairly serious research topic: https://arxiv.org/abs/2305.17493

We've by now reached a "probably not inevitable" - https://arxiv.org/abs/2404.01413 argues there's a finite upper bound to error - but I'd also point out that that paper assumes training data cardinality increases with the number of training generations and is strictly accumulative.

To a first order, that means you better have a pre-2022 dataset to get started, and have archived it well.

but it's probably fair to say current SOTA is still more or less "it's neither impossible nor inevitable".

link

astrange 719 days ago

Oh, no, they definitely believe both are going to happen and ChatGPT is just going to stop working because it'll see itself on the internet. It goes with the common belief that LLMs learn from what you type into them.

> To a first order, that means you better have a pre-2022 dataset to get started, and have archived it well.

I think that will always be available, or at least, a dataset with the distribution you want will be available.

link

groby_b 718 days ago

Don't know why you have such a disdain for artists, but either way, the original point was that model collapse wasn't "a coping idea made up by artists", but a valid research backed scientific model.

>I think that [clean pre-2022 data set] will always be available

Good luck obtaining one.

link

Workaccount2 719 days ago

Pay attention because it's only once you will get to watch humans learn they are nothing special in real time.

link

skybrian 719 days ago

Historically, similar things happened with heliocentrism and evolution, but I guess we weren't there to see it.

link

agi_is_coming 719 days ago

The distillation is done on-policy like RLHF -- the student model is generating the sequences and teacher is providing feedback in terms of logits.

link