| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jterrys 919 days ago
	I think using data that you don't have the copyrights to train AI is theft. That being said, Getty is hardly the paragon of goodwill considering they regularly steal from public domain databases, issue DMCA takedown requests of the stolen content from said databases, and then turn around to sell it to unwitting people for a subscription. They own none of the copyrights for what they are doing but have been allowed to get away with it.

2 comments

visarga 919 days ago

> I think using data that you don't have the copyrights to train AI is theft.

There are public domain works you can use and copyright doesn't protect ideas. It protects expression of ideas, so getting "just the ideas" without the expression is ok.

link

jterrys 918 days ago

Right. Public domain is stuff that doesn't have exclusive IP rights. You can do with that what you want.

The problem is that "expression of ideas" in the realm of AI is akin to plagiarism by human standards, because its a literal copying of the source material blended together. I couldn't recite you the entire plot of the Odyssey off the top of my head literally, but AI can, because it has the source material. We just tell it to do funny ha-ha things so its okay.

link

zmgsabst 919 days ago

Have you only read books you own the copyright to?

What’s the legal distinction between you learning and AI learning?

link

keonix 919 days ago

If I regurgiate something I read in copyrighted book without proper license that also would be theft, no distinction there.

I'm not distributing my brain, at least same (but probably more restrictive) should apply to models - training is okay, but using and distributing should be limited by copyright

link

broken-kebab 919 days ago

Explaining anything publicly based on my understanding I got reading books would be illegal following this logic. I'm not sure this is how it works.

link

visarga 919 days ago

They want to muddle the distinction between ideas and expression. You can't copyright ideas. Everyone is entitled to copy ideas.

link

keonix 919 days ago

It would not be illegal based on fair use (though you have to be careful there also), but if you try to regurgiate large portions of the book then it would be. And we do know that models regurgiate training material verbatim (Copilot)

link

lelanthran 919 days ago

Redistribution, and the scale of it.

Besides which, "learning" isn't a fair use exemption anyway.

link