Hacker News new | ask | show | jobs
by gdiamos 468 days ago
Bidirectional seq2seq models are usually more accurate than unidirectional models.

However, autoregressive models that generate one token at a time are usually more accurate than parallel models that generate multiple tokens at a time.

In diffusion LLMs, both of these two effects interact. You can trade them off by determining how many tokens are generated at a time, and how many future tokens are used to predict the next set of tokens.