| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hospitalJail 1107 days ago

> I get using a proper tokenizer and just calling `strings.Split`, and it seems to be remarkably stable for a given model and language (multiply the length of the result of splitting on spaces by 1.55 for OpenAI and 1.7 for Claude, which leaves a tiny safety margin).

One time I suggested this, got downvoted to hell.

To be fair to the downvoters, I quoted OpenAIs 7 tokens per word(on their tutorial page).

Seems incredibly unrealistic in hindsight, but at the time, things were fresh. Also, I think most people wanted something more robust than a linear calculation.