| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sojuz151 805 days ago
	>My intuition is that as contexts get longer we start hitting the limits of how much comprehension can be embedded in a single point of vector space, and will need better architectures for selecting the relevant portions of the context. We are dealing with multi-headed attention, therefore we have multiple points per token. You can always increase the number of heads or the size of the key vector.

1 comments

causal 805 days ago

The token embedding is what ultimately gets nudged around by the heads though, right? The key vector just relates to the context size, not the token embedding size, afaik.

link