|
|
|
|
|
by euleriancon
201 days ago
|
|
Diffusion LMs do seem to be able to get more out of the same data. In a world where we are already training transformer based LLMs on all text available, diffusion LMs ability to continue learning on a fixed set of data may be able to outperform transformers https://arxiv.org/abs/2511.03276 |
|
So it’s more about the mask modeling objective than Diffusion.