|
|
|
|
|
by HarHarVeryFunny
515 days ago
|
|
No - you can give the LLM a list of letters and it STILL won't be able to count them reliably, so you are guessing wrong about where the difficult lies. Try asking Claude: how many 'r's are in this list (just give me a number as your response, nothing else) : s t r a w b e r r y |
|
Nobody who suggests methods like character or byte level 'tokenization' suggests a model trained on current tokenization schemes should be able to do what you are suggesting. They are suggesting actually train it on characters or bytes.
You say all this as though I'm suggesting something novel. I'm not. Appealing to authority is kinda lame, but maybe see Andrej's take: https://x.com/karpathy/status/1657949234535211009