Hacker News new | ask | show | jobs
by simonw 410 days ago
Should an AI model be able to answer the question "which team won the superbowl in 2023" if there are thousands of articles out there containing that information but not a single one of them has been licensed for use by AI?
2 comments

If you could separate the information from the intellectual property, sure; but if the model is also capable of generating a similar article, that's the point where it starts infringing on the IP of all the authors whose articles were fed into the model.

So in practice, no, it shouldn't. Not because that information itself is bad, but because it probably isn't limited to just that answer.

In summary, I think it is definitely a problem when:

1. The model is trained on a certain type of intellectual property 2. The model is then asked to produce content of the same type 3. The authors of the training data did not consent

And slightly less so, but still questionable when instead:

2. The IP becomes an integral part of the new product

which, arguably, is the case for any and all AI training data; individually you could take any of them out and not much would happen, but remove them all and the entire product is gone.

No.

That's a funny example since broadcasters have to pay a fee to say "The Super Bowl" in the first place. If they don't, they have to use some euphemism like "the big game."

The answer is definitely no. You cannot use something that you don't have a license for unless it belongs to you.

I didn't know that about euphemisms, that's a great little detail - makes this hypothetical question even more interesting!

(For what it's worth to, Claude disagrees and claims that news organizations ARE allowed to use the term Super Bowl, but companies that aren't official sponsors can't use it in their ads. But Claude is not a lawyer so <shrug>)