| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mike_hearn 453 days ago

Not fully.

The point of transformer attention is cross-wise processing of tokens that computes their relationship to each other at multiple levels of abstraction. That's why LLMs can read so fast: they're processing all the input tokens in parallel.

LLMs emit tokens in a sequential manner at the level of the outer loop, but clearly inside the activations is a non-sequential map of the entire planned output, otherwise they wouldn't be able to make coherent sentences or speak German (which puts verbs at the end).