|
|
|
|
|
by quantadev
594 days ago
|
|
If you parse my words a bit more carefully, you'll realize to test my claim there's a simple thought experiment (or real experiment) you can do which is this: Take our "current large size" (my words from last post) LLMs, as they are currently today, and then simply remove the Self-Attention wiring, and see if that destroys the emergent intelligence aspect or not. I claim it would. But at the same time this doesn't mean you can just stick Self-Attention onto a small model and expect intelligence to once again emerge. |
|
Also, performance of the modern “small” models show that your last sentence isn't really true either.