|
|
|
|
|
by janalsncm
461 days ago
|
|
I’m not familiar with that paper but it would probably be best to compare speeds with an unoptimized transformer decoder. The Vaswani paper came out 8 years ago so implementations will be pretty highly optimized at this point. On the other hand if there was a theoretical reason why text diffusion models could never be faster than autoregressive transformers it would be notable. |
|