Hacker News new | ask | show | jobs
by danielvaughn 99 days ago
I'm not an expert in LLMs, but I don't think character length matters. Text is deterministically tokenized into byte sequences before being fed as context to the LLM, so in theory `mySuperLongVariableName` uses the same number of tokens as `a`. Happy to be corrected here.
1 comments

Running it through https://platform.openai.com/tokenizer "mySuperLongVariableName" takes 5 tokens. "a", takes 1. mediumvarname is 3 though. "though" is 1.