|
|
|
|
|
by kazinator
305 days ago
|
|
Because model size is a trivial parameter, and not a new paradigm. What you're saying is like, you can't extrapolate that long division works on 100 digit numbers because you only worked through it using 7 digit numbers and a few small polynomials. |
|
Sometimes, we go so far as to say there is "emergence" of qualitative differences. But really, this is not necessary (and not proven to actually occur).
What is true is that the performance of LLMs at OOD tasks changes with scale.
So no, it's not the same as solving a math problem.