| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by janalsncm 461 days ago
	I’m not familiar with that paper but it would probably be best to compare speeds with an unoptimized transformer decoder. The Vaswani paper came out 8 years ago so implementations will be pretty highly optimized at this point. On the other hand if there was a theoretical reason why text diffusion models could never be faster than autoregressive transformers it would be notable.

1 comments

kadushka 461 days ago

There’s not enough improvement over regular LLMs to motivate optimization effort. Recall that the original transformer was well received because it was fast and scalable compared to RNNs.

link