|
|
|
|
|
by egnehots
536 days ago
|
|
If you understand how LLMs work, you should disregard tests such as: - How many 'r's are in Strawberry? - Finding the fourth word of the response These tests are at odds with the tokenizer and next-word prediction model.
They do not accurately represent an LLM's capabilities.
It's akin to asking a blind person to identify colors. |
|
> Here's "strawberry" spelled out one character per line: s t r a w b e r r y
Most LLMs can handle that perfectly. Meaning, they can abstract over tokens into individual characters. Yet, most lack the ability to perform that multi-level inference to count individual 'r's.
From this perspective, I think it's the opposite. Something like the strawberry-tests is a good indicator how far the LLM is able to connect individually easy, but not readily interconnected steps.