|
|
|
|
|
by miven
1011 days ago
|
|
Do you know whether LLMs grasp the equivalence of a word expressed as one whole-word token and as a series of single character tokens that spell out the same word?
I'm curious if modifying the way some input words are split into tokens could be useful for letter-by-letter reasoning like in Wordle. Or would an LLM get confused if we were to alter the way the tokenization of the input text is done, since it probably never encountered other token-"spellings" of the same word? |
|
https://www.geeksforgeeks.org/lzw-lempel-ziv-welch-compressi...
For 'code table' substitute 'token table'.