|
|
|
|
|
by dragonwriter
1169 days ago
|
|
> Remember, OpenAI’s tokenizer was created in an era when 125MB was considered large for a language model. GPT-2 and GPT-3 have different vocabularies and maximum token #s, which (even if the tokenizer architecture is the same) implies a different tokenizer model. GPT-3.5 might share the GPT-3 tokenizer, but even then I’d expect GPT-4 to have its own. But even if they are using the tokenizer from GPT-3, its not from “an era when 125MB was considered large for a language model”. |
|
You had me questioning myself for a minute.
(The vocab size is still 50257. Even rounded up to a multiple of 128 for better sharding across the vocab embedding, only the first 50257 are used.)
Believe it or not, 125M was large at the start of the GPT-2 era. No one knew LLMs could do anything interesting, let alone that they'd change the world.