|
|
|
|
|
by gdiamos
468 days ago
|
|
Bidirectional seq2seq models are usually more accurate than unidirectional models. However, autoregressive models that generate one token at a time are usually more accurate than parallel models that generate multiple tokens at a time. In diffusion LLMs, both of these two effects interact. You can trade them off by determining how many tokens are generated at a time, and how many future tokens are used to predict the next set of tokens. |
|