|
|
|
|
|
by wjessup
1195 days ago
|
|
The limitation is because of the word position embedding matrix size. This isn't a config issue, or an API limitation. This is a limitation on the size of a matrix that is part of the model and is decided on before training. You can't change it. What does that mean? For each token in your input or inference output it requires the model to have some understanding of what the position of the word means. So there is the word position embedding matrix that contains a vector per position. The matrix has "only" 1024 entries in it for GPT2 or 4096 for GPT3. The size of each entry varies as well, containing a vector from 768 for GPT2 small and up to 12,288 for GPT3. So the WPE (word position embeddings) for GPT2 is (1024x768) and for GPT3 (4096x12288) Inference requires info from this vector to be added to the word tokens embedding for each token in the original prompt + each generated token. |
|
As often is the case with these large models, you can change it with some finetuning on longer context samples from the same dataset, with what is really a small amount of compute invested compared to the million hours spent on training the thing.