| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kazinator 351 days ago
	Because model size is a trivial parameter, and not a new paradigm. What you're saying is like, you can't extrapolate that long division works on 100 digit numbers because you only worked through it using 7 digit numbers and a few small polynomials.

3 comments

zwaps 351 days ago

Scale changes the performance of LLMs.

Sometimes, we go so far as to say there is "emergence" of qualitative differences. But really, this is not necessary (and not proven to actually occur).

What is true is that the performance of LLMs at OOD tasks changes with scale.

So no, it's not the same as solving a math problem.

link

lossolo 351 days ago

> What is true is that the performance of LLMs at OOD tasks changes with scale.

If scaling alone guaranteed strong OOD generalization, we’d expect the largest models to consistently top OOD benchmarks but this isn’t the case. In practice, scaling primarily increases a model’s capacity to represent and exploit statistical relationships present in the training distribution. This reliably boosts in-distribution performance but yields limited gains on tasks that are distributionally distant from the training data, especially if the underlying dataset is unchanged. That’s why trillion parameter models trained on the same corpus may excel at tasks similar to those seen in training, but won’t necessarily show proportional improvements on genuinely novel OOD tasks.

link

kazinator 351 days ago

If you scale the LLM, you have to scale the tasks.

Of course performance improves on the same tasks.

The researchers behind the submitted work chose a certain size and certain size problems, controlling everything. There is no reason to believe that their results won't generalize to larger or smaller models.

Of course, not for the input problems being held constant! That is as strawman.

link

barrkel 351 days ago

Alas, not true. It would be easier to predict progress if so.

link

exe34 351 days ago

This is 100% how it doesn't work with LLMs.

link