|
|
|
|
|
by GaggiX
1253 days ago
|
|
Dalle 2 does not use any adversarial loss (so no GAN), it uses a text2image diffusion based model and two diffusion based upscaler, VQGAN is an autoencoder, alone it can't do much, Dalle 1 works thx to the autoregressive model (also no GAN), Stable Diffusion uses an autoencoder because running a diffusion model on a 1024/768/512 image is really inefficient as the model has no bottleneck, the autoencoder has an adversarial loss but upscaling a 64x64x4 latent up to a 512x512x3 image is a much simpler job than generating the 64x64x4 from scratch, that's why you need a diffusion or an autoregressive model as a base. |
|
> Dalle 1 works thx to the autoregressive model (also no GAN)
It uses an autoregressive model to predict codes for a pretrained VQGAN, doesn't it?
Doesn't Stable Diffusion's autoencoder also use an adversarial loss? Otherwise wouldn't it suffer the typical blurring problems well known to MSE?