Hacker News new | ask | show | jobs
by wokwokwok 1211 days ago
Yes, but I mean it's also wrong...

That's not how the diffusion process works. You can pick any number of interesting ways to describe it but if they're technically wrong, it doesn't really matter how poetic they are right?

Diffusion models do use random noise.

As I understand it, every 'step' is composed of three parts: a) the previous output, b) the latent generated from the prompt and c) random noise.

As you move further up, the scheduler changes the weights of a, b, and c that get mixed in.

...but from the article:

> The subtle error comes in a misunderstanding about the "randomly generated noise."

It's not an error. You're just focusing on what you want to focus on.

Let's be 100% blunt: The author of an AI art image is pressing the random generator button. Every time. The output is random.

It's not a matter of debate; the initial seed to the diffusion model is random noise.

The prompt guides the diffusion process, which basically denoises the random noise added to the image certainly... but saying there's no random component to it is completely and utterly wrong.

2 comments

The sentence you quote is the beginning of a paragraph that ends with “There is random noise, but the visual layer evolves the final image from the noise based upon the latent ’meaning’ in the prompt”… which is in complete agreement with what you’re saying? It certainly isn’t saying diffusion models don’t use random noise. It has the phrase “there is random noise”, that is just wholly incompatible with that claim.

Perhaps that first sentence could be more precise, but by the end of the paragraph the author’s meaning is clear: the court has a misunderstanding about the “randomly generated noise” when it believes there is randomly generated noise in both the pixel and the latent - this is not the case, there is no randomness in the latent, that exact handcrafted prompt picks out a precise spot in the model’s giant table of embeddings, that prompt will always pick out that spot in that model, and the random noise is only on the pixel side of things. The author believes the court has this misunderstanding because the court uses the analogy of “a patron makes a suggestion to an artist”, which is a scenario that DOES have random noise involved in producing the latent (the brain is an inherently noise place; an artist’s brain likely even more so).

> the court has a misunderstanding about the “randomly generated noise” when it believes there is randomly generated noise in both the pixel and the latent - this is not the case, there is no randomness in the latent

No.

This is factually incorrect.

The random noise is applied in the latent.

The VAE doesn’t add random shit when you convert to pixel space.

Having randomness in pixel space would make no sense at all.

You don’t ever pick a deterministic point in the latent space (unless you fix the seed to the random number generator, and then you're still picking a random point in the latent space, unless you're prepared to argue you somehow know what a specific seed is going to do before you use it... you're just saving the point for later).

Ultimately, it comes down to this:

- You have a function that generates a bunch of random images.

- You pick one.

Did you create the image? No. You didn't.

Did you apply creativity? Should it be copyrightable? Maybe? You picked the one you liked most out of a set. You certainly applied your sense of aesthetics.

The practical question is:

What stops someone generating every possible image and copyrighting it?

Sorry! I know you typed that prompt out and got a random seed, but it turns out I've actually copyrighted every image for that prompt for seeds 200000000 - 300000000. It's only a 100 million images. Easy.

Right? Who cares if it's random or not? (I do, pedantically, it is), but practically, the copyright office doesn't care. They care about the practicality of preventing the system being abused.

We can argue about the semantics of where the noise is applied, but it doesn't actually matter.

How do you support people by letting them copyright their 'human scale' generated content, but avoid abuse from trolls who apply 'industrial scale copyrighting' with the same process?

This is YC, so I get to be pedantic ;-)

For stable diffusion, you can actually just set the seed to a fixed number.

After that you can always get the exact same image for the exact same prompt.[1]

In general, there are some interesting philosophical debates to be had about pseudo-random number generators.

[1] YMMV: in some configurations you may still have other sources of noise.