|
|
|
|
|
by simonw
764 days ago
|
|
Oh interesting, does that mean languages other than English won't be paying such a large penalty in terms of token lengths? With previous tokenizers there was a notable increase in the number of tokens needed to represent non-English sentences: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/ |
|