| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bugglebeetle 672 days ago
	Yes, illustrators are notorious IP rentiers like the Hollywood studios and the RIAA. It’s the tech billionaires that are the victims of their vile, unjust monopoly tactics. These are coherent thoughts that demonstrate why it’s a good idea to argue from analogies.

1 comments

ronsor 672 days ago

I'm not praising tech billionaires, nor am I attacking RIAA/Hollywood or online artists as entities. Please don't start crafting strawmen. I'm criticizing the "stealing" argument because I don't find it logically sound; it doesn't matter who's saying it.

I am still more than willing to have a civil debate around the argument itself.

Why is it stealing to analyze images? I would be more convinced if AI used a fixed database during generation, or if it was considered a standard, acceptable practice to reproduce training data as "new" generations.

link

bugglebeetle 672 days ago

You don’t find the stealing argument logically sound because you immediately frame the theft as “analyzing” to suit your own narrative and then demand people engage with it, while proceeding to make further spurious claims like…

> I would be more convinced if AI used a fixed database during generation

Wow, I didn’t know that model weights, an elaborately compressed form of their training data, rewrote themselves every time they were invoked. Or that it’s only theft if I stole data from a fixed database to build my own service.

link

ronsor 672 days ago

AI training is literally analyzing. That is how it works. Properly trained models (i.e., ones that aren't overparameterized or overfit) do not just "elaborately compress" training data as this is not possible. For example, you cannot compress 1 billion images into 1 billion parameters, and expect to retrieve them later.

If objective facts are "my own narrative", then no rational discussion can occur.

link

bugglebeetle 672 days ago

Oh well, you should tell the folks at DeepMind and Meta about these objective facts then so they don’t waste any more time doing research:

https://arxiv.org/html/2309.10668v2

Maybe apply for a job there too, since you’re obviously so far ahead of everyone in understanding this problem space.

link

YurgenJurgensen 672 days ago

You absolutely can compress a subset of a billion images into a billion parameters if you throw out all but a thousand. Is it no longer copyright infringement if you also run enough irrelevant data through your algorithm alongside the images you’re stealing?

link

YurgenJurgensen 672 days ago

Don’t mind me, I’m just going to ‘analyse’ this UHD movie and produce a 480p video file in a different codec whose bits are almost entirely unlike those in the original and throws out almost all the information from the original. I’ll put it on a RAID array with thousands of others, mangling the bits of the ‘analysis’ even further. The right ‘prompt’ may cause the model to produce some imagery very similar to some of its ‘training data’ however.

You can use whatever weasel words you want, but bits go in and fewer derivative bits come out in both cases.

link

ronsor 672 days ago

This is a strawman.

The purpose of video codecs is to reproduce the original video. If you do that, it's copyright infringement.

AI models should not reproduce the original images. The output will not be something that already exists.

Purpose and intent matters.

link

YurgenJurgensen 672 days ago

You’re right, purpose and intent matters, and the intent is to profit from the work of others without their permission and without crediting or compensating them in any way.

link

jwells89 671 days ago

It has to do with what the resulting model is used for. It gets particularly dodgy if its commercial usage, because most if not all of the data used for training wasn’t licensed for that, making for a “laundering” effect.

Though I also think there’s an argument to be made that images need to be properly licensed to even be “analyzed” in this way, because it’s ultimately an unauthorized copy even if it involves picking the image apart and obfuscation. They were published with the intent of being viewed by the public, not for being reproduced in any shape or form.

link