| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yorwba 379 days ago

> I used it myself in talk almost 3 years ago! And it isn't a lie exactly, the linked paper is totally sound.

The paper was published in December last year and addresses your concerns head-on. For example, from the introduction:

"if the network can learn this ideal score function exactly, then they will implement a perfect reversal of the forward process. This, in turn, will only be able to turn Gaussian noise into memorized training examples. Thus, any originality in the outputs of diffusion models must lie in their failure to achieve the very objective they are trained on: learning the ideal score function. But how can they fail in intelligent ways that lead to many sensible new examples far from the training set?"

Their answers to these questions are very good and also cover things like correcting the output of previous steps. But the proof is in the pudding: the outputs of their alternative procedure match the models they're explaining very well.

I encourage you to read it; maybe you'll even find a new way to decompose images into surface material properties and lighting as a result.

1 comments

samsartor 379 days ago

I did read it, all the way through! It's really good. The part you are quoting is setting up the ELS, which does not memorize entire images due to the inductive biases of a CNN (translation symmetry, limited receptive field). But the equivalence to a patch moseic is still due to the assumption that the loss is perfectly minimized under those restrictions.

And I was impressed by the close fit to real CNNs/ResNets and even to UNets. But what that shows is that the real models are heavily overfit. The datasets they are using for evaluation here are _tiny_.

Edit: oh the talk is here btw, if anyone is curious https://youtu.be/c-eIa8QuB24