|
|
|
|
|
by Der_Einzige
1500 days ago
|
|
Part of the problem here is that GPT-3 has such a small vocabulary. It's 50K tokens, and many of those are either garbage, punctuation, or full words (rather than sub words). I'd be curious to see what scaling up the size of the vocabulary would do to improve these results in a model like GPT-3... |
|
A rare word like blithe is tokenized into two BPE tokens: bl and ithe, whereas common words like the get their own token.