|
|
|
|
|
by kadushka
462 days ago
|
|
I don’t know which text diffusion models you’re talking about, the latest and greatest is this one: https://arxiv.org/abs/2502.09992 and it’s extremely slow – couple of orders of magnitude slower than a regular LLM, mainly because it does not support KV caching, and requires many full sequence processing steps per token. |
|
On the other hand if there was a theoretical reason why text diffusion models could never be faster than autoregressive transformers it would be notable.