|
|
|
|
|
by pants2
85 days ago
|
|
Doesn't this just look like another case of "count the r's in strawberry" ie not understanding how tokenization works? This is well known and not that interesting to me - ask the model to use python to solve any of these questions and it will get it right every time. |
|
An LLM is a router and completely stateless aside from the context you feed into it. Attention is just routing the probability distribution of the next token, and I'm not sure that's going to accumulate much in a single pass.