Hacker News new | ask | show | jobs
by wheelie_boy 1083 days ago
It seems very difficult to ensure that a model will never output any of the copyrighted content that it was trained on. I can only think of three ways, but perhaps there are others

1. Evaluate every output from the model to ensure that none of the outputs are copyrighted

2. Evaluate every input to a model to ensure that the inputs are either not copyrighted or properly licensed

3. Change the definition of copyright so that ML models can do whatever they want

Nobody is doing #1, because that makes the business models not work. Established brands (like Adobe) are doing #2. I get the feeling that there are a lot of ML startups that are hoping that #3 will happen, but it seems unlikely

1 comments

Ensuring a model never outputs copyrighted content is unimportant and tangential. It's irrelevant. You don't look for a way to make humans output no copyrighted content, you address each time they do case by case.

A model training being rendered fair use doesn't mean any of its output can be used for whatever regardless.

> you address each time they do case by case.

That's what I listed as #1 - evaluate each individual output of the model to see if it violates copyright.

I think when GP says "address each time case by case", they mean "you sue them when they infringe", instead of "this human has an illegal brain because it remembers Taylor Swift's songs".

PS: your "#1" is really hard to do and I'd guess it is infeasible. Even Google (esp. Youtube) with their vast data capabilities, often gets it wrong.