|
|
|
|
|
by kazinator
310 days ago
|
|
> conclusions gathered from toy models and implying this generalises to production LLMs is useless You are just trotting out the tired argument that model size magically fixes the issues, rather than just improves the mirage, and so nothing can be known about models with M parameters by studying models with N < M parameters. Given enough parameters, a miraculous threshold is reached whereby LLMs switch from interpolating to extrapolating. Sure! |
|