|
|
|
|
|
by gurtinator
235 days ago
|
|
Thats because those demos probably use parallel decoding. In principle, dLLM inference is slower since you have to do bidirectional generation over the whole generation window for each diffusion step. Example; you unmask one token in the 128 window for 128 diffusion steps to generate the full window. |
|