Hacker News new | ask | show | jobs
by weinzierl 1092 days ago
The tokenization algorithms I encountered all had around 50000 tokens, which fits nicely into (and makes good use of) a 16-bit number. Is this just a coincidence or does it have advantages for the token to be a 16-bit representable number?
2 comments

I suspect it being 16 bit instead of 32 bit means more of them can get packed more tightly. Some instructions can operate on them in parallel.

But I personally think it's a coincidence, and it just so happens that 50k tokens are enough for the level of complexity the models have right now.

Probably a coincidence. The GPT-4 and GPT-3.5 tokenizer has 100k tokens.