|
|
|
|
|
by extasia
932 days ago
|
|
>It is counter-intuitive that LLMs can exhibit such resilience despite severe disruption to input tokenization caused by scrambled text. I'm not sure that i agree. an LLM maximising the likelihood of its output could surely permute its input in such a way that it unscrambles the text? Need to read a little deeper and will report back. edit: interesting result, but the paper doesn't present a good reason that this would be "counter-intuitive" imo. |
|
If you look at the example given in the paper, the word "won" is a single token. When it is scrambled as "wno" it is tokenised as "w" and "no" both of which are unrelated to the original token "won". Somehow the LLM is able to relate these two completely different tokens "w" and "no" back to the original token "won". I think the paper is claiming this is surprising because these tokens shouldn't have any correlation with each other in its training data.