Hacker News new | ask | show | jobs
by dahart 1229 days ago
In the case of an overfit image, which is the thing Stable Diffusion is being sued over, it is just compression, literally. The image data is stored in the network weights, and the image can be reconstructed. You’re drawing a distinction without a difference.
1 comments

is this (1) the lawsuit you're referring to?

'cause those images are not the same. Sports events are just easy to fake, because they're boring - all sports pictures look roughly the same.

Edited to add: There's another lawsuit (a class action - 2), and after a little light reading, I came across section 5: 'Do diffusion models copy?', and my stomach jumped.

What they're doing, to make a point at trial that stable diffusion copies images, is _training images into the model, then using that trained model to prove that stable diffusion is a compression algorithm_.

This is a patent fabrication. If you train a model hard enough, yeah, it will produce the image you trained it on. And become useless for all other images. Congrats, you've just compressed your 7kb image to a 7gb diffusion model.

What scares me about this, is that the average court in the US is absolutely dumb enough to fall for it.

1 - https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-...

2 - https://arxiv.org/pdf/2212.03860.pdf

This is dismissive in the face of increasing evidence that a bunch of NN models have already been caught reproducing accidentally overfit data. Many examples have popped up with Stable Diffusion, not just one you disagree with. Same goes for ChatGPT, for GitHub Copilot, for Imagen, and a bunch of models.

Calling people dumb is to be willfully ignorant to the fact that neural networks actually can and really do remember images, not just when overfitting, but also when examples are in a low-density area of the latent space, when it doesn’t have enough neighbors to average with. The machine really is technically a machine intentionally and specifically built to reproduce a weighted combination of it’s inputs, and it really is possible for that weight vector to spike on some specific training examples. This won’t go away by pretending it doesn’t happen, it will go away when people curate training data that is legal to use, and/or when people write software that detects and rejects outputs that are too similar to a training sample, or otherwise guarantee no individual examples can be reconstructed. This is precisely why the project we’re commenting on is interesting, because it takes a step in that direction.

I agree with you that they have the capacity to remember an image - but they're not compressing them. That's a fundamentally different thing. The argument being made by that class action lawsuit is that "this thing can reproduce image X so it's a compression algorithm and nothing more", which they are predicating on an exercise that is sneaky and dishonest, and only likely to hold water with someone who has a limited understanding of the tech and isn't paying very close attention.

I think it does go without saying that our legal system has made some pretty dumb decisions regarding tech in the past - we read here all the time about the patent system, which is damn close in spirit to copyright.

Again, yes, they can remember an image, but they are not remembering pixels, and it's not compression. The vectors you're referring to are not a smaller version of the data, nor are they a pixel representation or even a close derivative thereof. Sure, there's a connection between the latent space and the pixels, but I don't see how that's the same thing.

For those following along, (1) is the best paper I could find talking about extracting images from SD. I'm open to more resources, and I'm even open to being convinced I'm wrong, but not by intentionally overtraining a model and calling it 'compression'. That's a lie.

To take a step back here, is it really the incidental occasional regurgitating of an existing image that's got everyone on edge, or is that just an easier target than "this is disruptive so I want to make it go away"? I'm not saying it doesn't suck that this is gonna put a ton of people out of jobs; both my parents were professional photographers in the 80s. I get it. But like, let's talk about that. Not some orthogonal strawman.

And hey, just to get it out there. We might disagree but I'm not calling you dumb. I do appreciate your willingness to engage an opposing view - it's part of what keeps me coming back to HN.

1 - https://arxiv.org/pdf/2301.13188.pdf

Compression (especially a lossy one) means storing a smaller sample of the original data in whatever form you desire and then using some algorithm to reconstruct the original data up to some acceptable approximation. I would argue that in the situation we are discussing the network does just that and it is obvious to everyone involved.
this is the crux of my issue with the term "compression" in this context: Is it a smaller version of the data?

Yes, the model is smaller than the total input data. But when it comes to recreating a single image, how many of the weights must be configured 'just-so' recreate an image enough to call it the same image? I'll admit ignorance here - but I also don't think that this is a thing anyone knows for sure. We can only just extract othello piece colors from a simplified, othello-specialized model designed to recognize two colors.

How much of the information from other images must be present to perform this task?

My instinct, given my understanding of how these things work, is that to replicate an image with any recognizable fidelity, you have to overtrain the model enough that you've affected a set of weights much, much larger than the pixel data. The internal representation of these images is concerned with much more visual information than just 'this pixel is this color' - by looking at layered outputs from the inverse type of system (image recognition, which is the core component of these models), you can see that they're encoding layers of shading, lines that map to brushstrokes or object boundaries, foreground, background, all kinds of stuff. A direct representation of an image with all of these would be necessarily huge - and we know this because we have them. Artists use layers in all kinds of image-creation software, and they're always way bigger than the JPEG itself.

I get that this may sound pedantic, but the term 'compression' doesn't seem, to me, that it fits here. Compression, by definition, makes stuff smaller

Fair enough, maybe compression is a too specific term to apply here but I does not matter if it's compression or not to violate copyright. Compression was a good example to mention because it is already familiar to laypeople and established law. The main point is that it stores some sample of the original data - and if it's more it is derived from the original data (your strokes example) and applying some algorithm to reconstruct it to some approximation that we humans might find indistinguishable