|
|
|
|
|
by quantadev
597 days ago
|
|
> wildly overestimating the “emergent capabilities” How could I be "overestimating" the emergent capabilities when I never even quantified those capabilities other than to call them "emergent" and impressive? > “small” models show that your last sentence isn't true either. I never said that even a perfect architecture would make small models "intelligent". However to the extent that even smaller LLMs can exhibit surprising capabilities, that's more evidence IN FAVOR OF everything I've said, not evidence against. EDIT: But in that last sentence (of prior reply) by "small" what I meant was genuinely small, meaning non-LLM, and you seem to have interpreted it as "a smaller LLM" |
|
“All” that was needed to get there was “just” feeding it more data. The fact that we were actually able to train billion parameters models on multiple trillion tokens is the key property of the transformers, there's no magic beyond that (it's already cool enough though): it's not so much that they are more intelligent, it's simply that with them we can brute-force in a scalable fashion.