| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by furyofantares 620 days ago

> but, as far as I know, this is always done at the pixel level

Image models are NOT denoised at the pixel level - diffusion happens in latent space. This was one of the big breakthroughs that made all of this work well.

There's a model for encoding/decoding between pixels and latent space. Latent space is able to encode whatever concepts it needs in whichever of its dimensions it needs, and is generally lower dimensional than pixel space. So we get a noisy latent space, denoise it using the diffusion model, then use the other model (variational autoencoder) to decode into pixel space.