Hacker News new | ask | show | jobs
by sillysaurusx 1161 days ago
Actually, GPT-3's tokenizer is the same as GPT-2. https://datascience.stackexchange.com/a/109483

You had me questioning myself for a minute.

(The vocab size is still 50257. Even rounded up to a multiple of 128 for better sharding across the vocab embedding, only the first 50257 are used.)

Believe it or not, 125M was large at the start of the GPT-2 era. No one knew LLMs could do anything interesting, let alone that they'd change the world.