|
|
|
|
|
by visarga
1494 days ago
|
|
> Wow. Transformers are very limited in the size of the attention window. They can take a few thousand tokens at maximum. But your data might not fit into the window, and you also don't want to have to fine-tune the model. This paper offers a solution. |
|