| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rndmwlk 909 days ago
	It’s disingenuous to frame using data to train a model as a “view,” of that data. The simple cases are the easy ones, if ChatGPT completely rips a NYT article then that’s obviously infringement; however, there’s an argument to be made that every part of the LLM training dataset is, in part, used in every output of that LLM. I don’t know the solution, but I don’t like the idea that anything I post online that is openly viewable is automatically opted into being part of ML/AI training data, and I imagine that opinion would be amplified if my writing was a product which was being directly threatened by the very same models.

1 comments

bluefirebrand 909 days ago

All I can ever think about with how ML models work is that they sound an awful lot like Data Laundering schemes.

You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.

Saw this a lot with some earlier image models where you could type in an artists name and get their work back.

The fact that AI models are having to put up guardrails to prevent that sort of use is a good sign that they weren't trained ethically and they should be paying a ton of licensing fees to the people whose content they used without permission.

link

logicchains 909 days ago

>You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.

You can do exactly the same with a human author or artist if you prompt them to. And if you decide to publish this material, you're the one liable for breach of copyright, not the person you instructed to create the material.

link

asadotzler 909 days ago

Not if that person is a trillion dollar corporation. If they're a business that's regularly stealing content and re-writing it for their customers that business is gonna go down. Sure, a customer or two may go down with them but the business that sells counterfeit works to spec is not gonna last long.

link