| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by euleriancon 248 days ago
	Diffusion LMs do seem to be able to get more out of the same data. In a world where we are already training transformer based LLMs on all text available, diffusion LMs ability to continue learning on a fixed set of data may be able to outperform transformers https://arxiv.org/abs/2511.03276

1 comments

There’s another paper that shows you can get the same effect by training auto regression on Fill in the middle data.

So it’s more about the mask modeling objective than Diffusion.

Which paper is that?