| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by VMG 1202 days ago

I have to agree. The article summary says

> Transformer block: Guesses the next word. It is formed by an attention block and a feedforward block.

But the diagram shows transformer blocks chained in sequence. So the next transformer block in the sequence would only receive a single word as the input? Does not make sense.