This strikes me as less a leak and more clever marketing from Mistral.
Clearly we should train a diffusion model to denoise the weights of LLM transformer models. Yo dawg.