|
|
|
|
|
by psb217
1357 days ago
|
|
In the reverse diffusion process, the reason we can't directly jump from a noisy image at step t to a clean image at step 0 is that each possible noisy image at step t may be visited by potentially many real images during the forward diffusion process. Thus, our model which inverts the diffusion process by minimizing least-squares prediction error of a clean image given a noisy image at step t will learn to predict the mean over potentially many real images, which is not a itself a real image. To generate an image we start with a noise sample and take a step towards the _mean_ of the distribution of real images which would produce that noise sample when running the forward diffusion process. This step moves us towards the _mean_ of some distribution of real images and not towards a particular real image. But, as we take a bunch of small steps and gradually move back through the diffusion process, the effective distribution of real images over which this inverse diffusion prediction averages has lower and lower entropy, until it's effectively a specific real image, at which point we're done. |
|
...but, the question is, why can't we take a big step and be at the end in one step.
Obviously a series of small steps gets you there, but the question was why you need to take small steps.
I feel like this is just a 'intuitive explanation' that doesn't actually do anything other than rephrase the question; "You take a series of small steps to reduce the noise in each step and end up with a picture with no noise".
The real reason is that big steps result in worse results (1); the model was specifically designed to be a series of small steps because when you take big steps, you end up with over fitting, where the model just generates a few outputs from any input.
(1) - https://arxiv.org/pdf/1503.03585.pdf