|
|
|
|
|
by mike_hearn
453 days ago
|
|
Not fully. The point of transformer attention is cross-wise processing of tokens that computes their relationship to each other at multiple levels of abstraction. That's why LLMs can read so fast: they're processing all the input tokens in parallel. LLMs emit tokens in a sequential manner at the level of the outer loop, but clearly inside the activations is a non-sequential map of the entire planned output, otherwise they wouldn't be able to make coherent sentences or speak German (which puts verbs at the end). |
|