|
|
|
|
|
by dragonwriter
1166 days ago
|
|
> GPT-4 has 32k tokens IIRC. Including most significant alphabets would take less than a thousand. GPT-4 has much more 32k token vocabulary (GPT-3 seems to have had up to 175k, GPT-2 in the neighborhood of 50k, based on the max value reported for their tokenizers). It has a 32k token context window (that is, the maximum size of prompt + response), not vocab. But, tokens are generally semantically-significant parts of words (often whole words), not just letters or the equivalent. So, while you might get most alphabets in less than a thousand, you need a lot more than alphabet to handle a language. |
|