Hacker News new | ask | show | jobs
by VMG 1155 days ago
I have to agree. The article summary says

> Transformer block: Guesses the next word. It is formed by an attention block and a feedforward block.

But the diagram shows transformer blocks chained in sequence. So the next transformer block in the sequence would only receive a single word as the input? Does not make sense.