| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sillysaurusx 1161 days ago

Actually, GPT-3's tokenizer is the same as GPT-2. https://datascience.stackexchange.com/a/109483

You had me questioning myself for a minute.

(The vocab size is still 50257. Even rounded up to a multiple of 128 for better sharding across the vocab embedding, only the first 50257 are used.)

Believe it or not, 125M was large at the start of the GPT-2 era. No one knew LLMs could do anything interesting, let alone that they'd change the world.