Hacker News new | ask | show | jobs
by Retric 1026 days ago
Yes, someone using a model can’t know if the generated text/image/sound is a nearly identical copy of the original material they don’t recognize. If use of the output of these systems comes at significant legal risk then then such systems become nearly useless.
2 comments

> if the generated text/image/sound is a nearly identical copy of the original material they don’t recognize

how does the industry today deal with artists that "copy" off some other works? This isn't a problem with AI at all - just that AI provides a tool to generate such works faster.

Someones comes to me to ask for a drawing of Batman or to write an erotic story around Supergirl. I can do it, but I cannot claim ownership over the characters. And I think I will quickly get a letter from DC or Marvel if I try to do this at scale.
> I can do it, but I cannot claim ownership over the characters.

of course not. But you can claim ownership if you don't call those characters their original names, and make sufficient changes to the design (how sufficient is determined by a court of law - thus expenses).

> DC or Marvel if I try to do this at scale.

The show 'invincible'[1] has a character that is a basic copy of superman. And yet, you will find that they don't get a letter from DC.

[1] https://en.wikipedia.org/wiki/Invincible_(TV_series)

> make sufficient changes to the design

I think that’s one of the issue. The transformation done by these tools are mechanical. Even if it may be extensive. The human input is too small. Omniman may have similarities with Superman, but he is not him in the larger context of the story. LLMs can not yet be that consistent for marketable output that deserves to be copyrightable.

I’m perfectly fine for LLMs to aid with spell checking and alternative phrasing (image is a grayer area). Bu the ideas of prompts and prompt output being copyrightable is something I oppose.

> The human input is too small.

That's a huge assumption, especially for image generation models.

Why shouldn't a prompt output be copyrightable?
Because prompts lack sufficient creative control.

Typing a search sting into Google doesn’t provide copyright over its output.

The difference is the artists assertion that it’s either original or a copy from something else. DALLE 2 can’t tell you if it’s original or not. These AI’s have no idea and the company or group that created them doesn’t review individual output so they can’t say either.
> DALLE 2 can’t tell you if it’s original or not

whoever pressed the button to run DALLE will make the assertion, just like whoever was running photoshop to make the image today would make the same assertion.

Based on what?

A photoshop user controls what data photoshop uses, a DALLE user doesn’t. Even a prompt as generic as “Cat” could be producing an obviously derivative work if you compare it to the original. This is true for all prompts.

> A photoshop user controls what data photoshop uses

the point was that the user of the program is making their declaration, whether it's photoshop or DALLE. How does the business verify that their staff artists aren't producing copyright infringing material, just from memory?

The liability falls to them to verify the copyright status of the output they're asked to make. A business paying a photoshop user to produce a picture has just as much (or as little) trust in them as the button presser for DALLE.

This gets complicated, having no reason to know that something is copyrighted is a defense.

So if your employee installed pirated 3rd party software you’re facing strict liability. However, if a third party is reproducing their collage roommates drawing from memory then it’s effectively impossible for you to verify if something is a derivative work.

Dalle is effectively Getty images, if you’re buying works from them you can only assume it’s free of copyright issues.

The generated content is a derivative work of each piece of the material the model was trained on. That material can be listed.
So your suggestion is to list 100’s of millions of works and have users manually review them? I don’t think that’s going to work.