Hacker News new | ask | show | jobs
by kozikow 831 days ago
What's your feeling those types of lawsuits will land? Probably many people are also facing product decisions today keeping in mind the potential precedence law in the future.

I kind of feel that in general with AI regulation, the USA will have to go all-in somehow. EU will delve into future obscurity, but this technology is too valuable and China is breathing behind's USA neck, so I expect law to continue to be lax around IP of data used to train AI models.

7 comments

For some big players it's an opportunity to buy off the copyright trolls and cement their advantage. Someone like MSFT or Nvidia could do pretty well striking some deals and in doing so throwing up big barriers to entry. It's the smaller open source players that will get hurt eventually, you can be sure of that.

It would be best if a non-profit, the equivalent of EFF or FSF could take on the trolls in the name of software freedom and ideally get some legal clarity, as opposed to just getting some deals struck with big companies.

That non-profit could and should have been OpenAI.

Alas...

I don’t know about this one, but in general it seems like some of the legal issues coming at AI deployments are quite legitimate. Some AI datasets are definitely violating existing copyright law. We can certainly debate whether the laws should change, and there are valid points on both sides. But to the question of whether some are breaking current laws, the answer is clearly yes.

Obviously the answer here is for companies producing AI to curate, obtain, and/or pay for fully legal training data. The problems have been that gathering and using copyrighted data is very easy, and AI is extremely data hungry (some experts theorize massive data alone is responsible for AI success, and algorithms are secondary at best), and there’s at least the perception if not the reality of a high stakes winner-takes-all race to produce the best AI.

To me this feels a little like the situation tech companies have put themselves in with automated support and no way to reach humans, in that they couldn’t have scaled like they did and gotten there without dropping hands-on support on the floor, but they’ve created a time bomb that is beginning to backfire in more and more serious ways.

So far open source AI has been neck and neck with proprietary offerings. I'd even wager to say that companies as big as Eleven Labs have zero moat [1].

The minute copyright lawsuits begin to land, suddenly the copyright becomes the moat. Big companies will license, and everyone else will wither on the vine. Google and Facebook will reign supreme.

I'm terrified by that. I hope we have a few more years to compile open data sets and tools.

[1] That is to say, thus far, product is the only moat, not models and not data. Most companies have razor thin product so far.

Something like free software is needed here. Something that copyright holders can apply and that affects models that are trained on it. GPL v4 perhaps?
It's questioned whether copyright applies at all. If it does not, then license doesn't matter either.
At least in the US, the Supreme Court has decided in the past that shrinkwrap licenses can be used to put restrictions on works that copyright doesn't apply to (https://en.wikipedia.org/wiki/ProCD,_Inc._v._Zeidenberg), so I wouldn't be surprised if we start seeing clickwrap "you agree not to train AI on this page without the author's explicit permission" licenses.
Exactly.

And if copyright applies, wouldnt it imply that someone learning something from a book could also then be controlled by the licensing of said book on how their gained knowledge could be utilized in the future?

No, because AI training is not human learning.
These types of lawsuits probably won't work in the EU because the EU has already exempted training from copyright restrictions. It's already a settled question over there.
Even European authors will be able to sue a system that produces unauthorized reproductions of their work. Just because you can train a model, you still don't necessarily have the right to sell or give away access to it (unauthorized distribution and reproduction of copyrighted work), and will probably be liable if you publish any of its output that is plagiarism of copyrighted work. It will be interesting to see if the AI companies succeed in pushing the liability onto the end users when it hits the courts. And if industries adopt techniques like 'trap streets' to catch violations.
hardware-wise it won't go anywhere. It's like blaming the knife maker for stabbings.

software - no idea. It won't be banned for sure, but something like 5% royalty, could happen.

They are not getting sued for their hardware.
If I read 5,000 books and wrote a short poem about something vaguely similar to those books, its not copyright infringement.