|
|
|
|
|
by synapsomorphy
394 days ago
|
|
Nit: Diffusion isn't in place of transformers, it's in place of autoregression. Prior diffusion LLMs like Mercury [1] still use a transformer, but there's no causal masking, so the entire input is processed all at once and the output generation is obviously different. I very strongly suspect this is also using a transformer. [1] https://www.inceptionlabs.ai/introducing-mercury |
|
Earlier image diffusion models used U-nets: https://en.wikipedia.org/wiki/U-Net