Hacker News new | ask | show | jobs
by temp512345 1230 days ago
It is exactly remembering the pixels. Just not all of them and it obviously fills in gaps (more hair as mentioned in a another post). You can consider the way it stores those pixels as a lossy compression format. If I copy a music sample but I store a compressed version of it (mp3 for example) you will not find the original bits in my database at all. I am still violating copyright.
1 comments

But it's really not, though. It's remembering something related to the pixels, yeah, but that's like remembering the shape a line can take or the color of the sky.

To extend your musical analogy, it's remembering that many songs are in 4/4 time, and that major chords sound appealing.

Also, were you to compress anything, an mp3 or a picture, in a lossy fashion, to that degree of compression (~10^-5), you would no longer have anything resembling the original. The audio would be glitchy noise, and the image would be a scattering of apparently random pixels a few pixels wide.

Here's the thing - I empathize that this is disruptive in a very similar fashion to a tool that does store compressed copies of the work in question. It is capable of doing the same kind of damage. There's a conversation to be had there - but it's just not compression. That's not how the thing works.

In the case of an overfit image, which is the thing Stable Diffusion is being sued over, it is just compression, literally. The image data is stored in the network weights, and the image can be reconstructed. You’re drawing a distinction without a difference.
is this (1) the lawsuit you're referring to?

'cause those images are not the same. Sports events are just easy to fake, because they're boring - all sports pictures look roughly the same.

Edited to add: There's another lawsuit (a class action - 2), and after a little light reading, I came across section 5: 'Do diffusion models copy?', and my stomach jumped.

What they're doing, to make a point at trial that stable diffusion copies images, is _training images into the model, then using that trained model to prove that stable diffusion is a compression algorithm_.

This is a patent fabrication. If you train a model hard enough, yeah, it will produce the image you trained it on. And become useless for all other images. Congrats, you've just compressed your 7kb image to a 7gb diffusion model.

What scares me about this, is that the average court in the US is absolutely dumb enough to fall for it.

1 - https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-...

2 - https://arxiv.org/pdf/2212.03860.pdf

This is dismissive in the face of increasing evidence that a bunch of NN models have already been caught reproducing accidentally overfit data. Many examples have popped up with Stable Diffusion, not just one you disagree with. Same goes for ChatGPT, for GitHub Copilot, for Imagen, and a bunch of models.

Calling people dumb is to be willfully ignorant to the fact that neural networks actually can and really do remember images, not just when overfitting, but also when examples are in a low-density area of the latent space, when it doesn’t have enough neighbors to average with. The machine really is technically a machine intentionally and specifically built to reproduce a weighted combination of it’s inputs, and it really is possible for that weight vector to spike on some specific training examples. This won’t go away by pretending it doesn’t happen, it will go away when people curate training data that is legal to use, and/or when people write software that detects and rejects outputs that are too similar to a training sample, or otherwise guarantee no individual examples can be reconstructed. This is precisely why the project we’re commenting on is interesting, because it takes a step in that direction.

I agree with you that they have the capacity to remember an image - but they're not compressing them. That's a fundamentally different thing. The argument being made by that class action lawsuit is that "this thing can reproduce image X so it's a compression algorithm and nothing more", which they are predicating on an exercise that is sneaky and dishonest, and only likely to hold water with someone who has a limited understanding of the tech and isn't paying very close attention.

I think it does go without saying that our legal system has made some pretty dumb decisions regarding tech in the past - we read here all the time about the patent system, which is damn close in spirit to copyright.

Again, yes, they can remember an image, but they are not remembering pixels, and it's not compression. The vectors you're referring to are not a smaller version of the data, nor are they a pixel representation or even a close derivative thereof. Sure, there's a connection between the latent space and the pixels, but I don't see how that's the same thing.

For those following along, (1) is the best paper I could find talking about extracting images from SD. I'm open to more resources, and I'm even open to being convinced I'm wrong, but not by intentionally overtraining a model and calling it 'compression'. That's a lie.

To take a step back here, is it really the incidental occasional regurgitating of an existing image that's got everyone on edge, or is that just an easier target than "this is disruptive so I want to make it go away"? I'm not saying it doesn't suck that this is gonna put a ton of people out of jobs; both my parents were professional photographers in the 80s. I get it. But like, let's talk about that. Not some orthogonal strawman.

And hey, just to get it out there. We might disagree but I'm not calling you dumb. I do appreciate your willingness to engage an opposing view - it's part of what keeps me coming back to HN.

1 - https://arxiv.org/pdf/2301.13188.pdf

Compression (especially a lossy one) means storing a smaller sample of the original data in whatever form you desire and then using some algorithm to reconstruct the original data up to some acceptable approximation. I would argue that in the situation we are discussing the network does just that and it is obvious to everyone involved.