|
|
|
|
|
by weinzierl
1092 days ago
|
|
The tokenization algorithms I encountered all had around 50000 tokens, which fits nicely into (and makes good use of) a 16-bit number. Is this just a coincidence or does it have advantages for the token to be a 16-bit representable number? |
|
But I personally think it's a coincidence, and it just so happens that 50k tokens are enough for the level of complexity the models have right now.