|
|
|
|
|
by rassibassi
1374 days ago
|
|
Yes, and another reason for the small model size and the novelty of the underlying paper [1], is that the diffusion model is not acting on the pixel space but rather on a latent space. This means that this 'latent diffusion model' does not only learn the task at hand (image synthesis) but in parallel also a powerful lossy compression model via an outer auto encoder structure. Now, the number of weights (model size) can be reduced drastically as the inner neural network layers act on a lower dimensional latent space rather than a high dimensional pixel space. It's fascinating because it shows that deep learning at its core comes down to compression/decompression (encoding/decoding), with close relation to Shannon's Information Theory (e.g. source coding/channel coding/data processing inequality). [1] https://arxiv.org/abs/2112.10752 |
|