|
|
|
|
|
by jacquesm
1011 days ago
|
|
LLMs are trained on 'tokens' derived from 'words' and 'text' and even though there are tokens that are just one letter the bulk is a rough approximation to syllables as though you're trying to create a dictionary to be used for data compression. It might be more effective to try to play 'tokendle' before trying to play 'wordle'. |
|
Or would an LLM get confused if we were to alter the way the tokenization of the input text is done, since it probably never encountered other token-"spellings" of the same word?