|
|
|
|
|
by sillysaurusx
1161 days ago
|
|
Actually, GPT-3's tokenizer is the same as GPT-2. https://datascience.stackexchange.com/a/109483 You had me questioning myself for a minute. (The vocab size is still 50257. Even rounded up to a multiple of 128 for better sharding across the vocab embedding, only the first 50257 are used.) Believe it or not, 125M was large at the start of the GPT-2 era. No one knew LLMs could do anything interesting, let alone that they'd change the world. |
|