|
|
|
|
|
by sojuz151
758 days ago
|
|
>My intuition is that as contexts get longer we start hitting the limits of how much comprehension can be embedded in a single point of vector space, and will need better architectures for selecting the relevant portions of the context. We are dealing with multi-headed attention, therefore we have multiple points per token. You can always increase the number of heads or the size of the key vector. |
|