| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 1400 days ago

It's not worth discussing about Getty so much. AI labs will collect a dataset to predict if an image is watermarked. They will crawl to index the Getty images to make sure they are not in the training set. Then retrain and in 2 months the problem is solved. They can cut out a sizeable part of the training set without problem, the model will still be good.

They can also OCR the output to make sure there are no blacklisted words and use an index to skip all images that look too similar to the training data. Then the argument of copyright defenders is going to be weakened.

The fact that a prompt and curation are necessary also goes against the "AI works can't be copyrighted" narrative - it's generated by a human-AI team, so human work is part of the process.

The core of the issue I see is that human and AI both learn from the published media but an AI can both "see" and "draw" more than a human, so there is an important distinction there.

1 comments

csmpltn 1400 days ago

I understand that there are (both practical and theoretical) ways to reduce the chances of an AI generating an image that has copyrighted elements in it (such as the "GettyImages" logo).

I'm mostly curious about the legal aspects of having a black-box system that can - under some unknown circumstances - attach openly copyrighted or trademarked elements (such as a company logo) to a piece of work.

link