| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rassibassi 1421 days ago
	Yes, and another reason for the small model size and the novelty of the underlying paper [1], is that the diffusion model is not acting on the pixel space but rather on a latent space. This means that this 'latent diffusion model' does not only learn the task at hand (image synthesis) but in parallel also a powerful lossy compression model via an outer auto encoder structure. Now, the number of weights (model size) can be reduced drastically as the inner neural network layers act on a lower dimensional latent space rather than a high dimensional pixel space. It's fascinating because it shows that deep learning at its core comes down to compression/decompression (encoding/decoding), with close relation to Shannon's Information Theory (e.g. source coding/channel coding/data processing inequality). [1] https://arxiv.org/abs/2112.10752

2 comments

jrm4 1419 days ago

Oh, wow. Now that you mention how it's similar to lossy (if not the same as) compression it all makes a LOT of sense. This is great. I teach IT and I already do a bit on how lossy compression works, (e.g. hey, if you see a blue pixel and then another slightly darker one next to it, what's the NEXT likely to be?) and this is something of an extension of that.

link

rassibassi 1421 days ago

Correction: the auto encoder is pre-trained :)

link