| > This isn’t because the model can’t count. It’s because it never sees the letters at all. > The chunks aren’t characters and they aren’t words. They’re something more specific, and the specificity matters more than most people realize. > Those numbers are real, but they hide what a token actually is. > GPT-4’s vocabulary isn’t Claude’s. Claude’s isn’t Llama’s. > The model never sees text. It sees a sequence of integer indices into its own private alphabet. > So tokens aren’t “roughly like words” or “kind of like characters”. They’re the atoms of perception for one specific model, and they’re the only language that model speaks. > The same sentence is nine tokens to GPT-4 and seven tokens to Llama 3. Not because Llama is smarter or the sentence changed, but because the two models have different vocabularies. > That’s it. No clever scoring, no neural network. Could people who use LLM to write articles at least prompt them to have a better style? I'm really tired of the default Claude style (a lot of Chinese models also reuse the same style) |
What did you think about the more visual elements?
Simon