|
|
|
|
|
by p-e-w
50 days ago
|
|
> due to fundamental limitations People keep throwing this phrase around in relation to LLMs, when not a single “fundamental limitation” has been rigorously demonstrated to exist, and many tasks that were claimed to be impossible for LLMs two years ago supposedly due to “fundamental limitations” (e.g. character counting or phonetics) are non-issues for them today even without tools. |
|
The models now whaste a vast amount of useless neurons memorising the character count the entire English language so that people can ask how many r's are in strawberry and check a tickbox in a benchmark.
The architecture cannot efficiently or consistently represent counting letters in words. We should never have forced trained them to do it.
This goes for other more important "skills" that are unsuited to tranformer models.
Most models can now do decent arithmetics. But if you knew how it has encoded that ability in its neurons then you would never ever ever ever trust any arithmetic it ever outputs, even in seems to "know" it (unless it called a calculator MCP to achieve it).
There are fundamental limitations, but we're currently brute forcing ourselves through problems we could trivially solve with a different tool.