| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dahart 1002 days ago
	It’s a good idea to make this easier to report, but… shouldn’t it be on the AI company to train using legally acquired content in the first place? It’d be great if the training data was opt-in and curated. Wouldn’t that be better than a shoot first ask questions later policy? There’s definitely room to improve copyright and room to allow AI to exist, but do we really want to allow AI to ingest all copyrighted material and call it ‘fair use’? That would be giving them a ridiculous and unprecedented amount of freedom to take any and all content and turn around and auto-generate enough to obsolete the people who made the training material. It seems like the race is on to supplant Google as the portal for information, and it does feel like downloading everything in the world and then crying fair use after the fact is wishful thinking that more or less admits to copyright violation.

1 comments

Supply5411 1000 days ago

>shouldn’t it be on the AI company to train using legally acquired content in the first place

I don't think so. It's not illegal to look at or learn from copyrighted materials. If you start producing the materials it becomes a different question. I think the same applies to AI.

dahart 999 days ago

Your argument doesn’t work because OpenAI has admitted that ChatGPT is producing copyrighted material. They’re trying to carve an exception for AI, but have already acknowledged that training does copy the materials, literally, and that it does not “learn” from the the same way humans do. The intent with AI may be to remix them, but the whole reason there are multiple lawsuits here (as well as with Stable Diffusion and other NNs) is because they have repeatedly demonstrated they sometimes memorize the training data and can produce it more or less verbatim. They have violated current copyright law. In that light, we have two primary options: change the law, or enforce the current law. OpenAI is hoping to change the law, but whether they have copied some training data and produced it for the output is not even up for debate, this is already the different question you referred to.