| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by roborovskis 116 days ago
	What would you define as 'distillation' versus 'learning'? How do you know that what a LLM is doing is 'distillation' vs a process closer to a human reading a book? From my perspective, pretraining is pretty clearly not 'distilling', as the goal is not to replicate the pretraining data but to generalize. But what these companies are doing is clearly 'distilling' in that they want their models to exactly emulate Claude's behavior.

1 comments

armcat 116 days ago

That's a soft distinction (distilling vs learning). If I read a chapter in a text book I am distilling the knowledge from that chapter into my own latent space - one would hope I learn something. Flipping it the other way, you could say that model from Lab Y is ALSO learning the model from Lab X. Not just "distilling". Hence my original comment - how deep does this go?

link

EnPissant 116 days ago

And yet nearly every machine learning engineer would disagree with you, which is a given away that your argument is rooted in ideology.

link

armcat 116 days ago

> And yet nearly every machine learning engineer would disagree with you, which is a given away that your argument is rooted in ideology.

That's a bold statement! Of course I know the difference, in one case you are learning from correct/wrong answers, and in the other from a probability distribution. But in both cases you are using some X to move the weights. We can get down and gritty on KL divergence vs cross-entropy, but the whole topic is about "theft", which is perhaps in the eye of the beholder.

link