| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gdiamos 468 days ago

Bidirectional seq2seq models are usually more accurate than unidirectional models.

However, autoregressive models that generate one token at a time are usually more accurate than parallel models that generate multiple tokens at a time.

In diffusion LLMs, both of these two effects interact. You can trade them off by determining how many tokens are generated at a time, and how many future tokens are used to predict the next set of tokens.