|
|
|
|
|
by viscanti
1161 days ago
|
|
Because there's so much more English language for them to train on relative to most other languages, they're able to do some optimizations for English that they can't elsewhere. Should they not be able to implement optimizations for cases where they have the data volume to do so? |
|
The tokenizer used to train GPT-3 was old, inefficient and targeted at tokenizing English. That's pretty much all there is to it. It's possible to train a tokenizer that is more efficient and more including of other languages.
GPT-4's tokenizer is already far more efficient though still weighted to English.