|
|
|
|
|
by littlestymaar
595 days ago
|
|
You are wildly overestimating the “emergent capabilities” of current models, and underestimate alternative architectures's (namely SSM) performance at the same size. Also, performance of the modern “small” models show that your last sentence isn't really true either. |
|
How could I be "overestimating" the emergent capabilities when I never even quantified those capabilities other than to call them "emergent" and impressive?
> “small” models show that your last sentence isn't true either.
I never said that even a perfect architecture would make small models "intelligent". However to the extent that even smaller LLMs can exhibit surprising capabilities, that's more evidence IN FAVOR OF everything I've said, not evidence against.
EDIT: But in that last sentence (of prior reply) by "small" what I meant was genuinely small, meaning non-LLM, and you seem to have interpreted it as "a smaller LLM"