Hacker News new | ask | show | jobs
by Ukv 636 days ago
> Why would we embrace this now that a computer can do it and there's a level of deniability?

Generally I don't think people are arguing that copyright law should be more lenient to AI than it is to humans. If your work gets ripped off (a substantially similar copy not covered by fair use) you can sue regardless of tools used in its creation.

Question would be whether machine learning, unlike human learning, should be treated as copyright infringement. There are differences and the law does not inherently need to treat them the same, but it could.

As to why it should: I think there's huge benefit across a large range of industries to web-scale pretraining and foundation models, and I'd like it to remain accessible to open-source groups or smaller companies without huge data moats. Realistically I think the alternative would likely just benefit Getty/Universal with near-identical outcomes for most actual artists.

When the very basis of copyright is for the "progress of sciences and useful arts", it seems backwards to use it in a way that would set back advances in language translation, malware/spam/DDoS filtering, defect detection, voice dictation/transcription, medical image segmentation, etc.

2 comments

> Question would be whether machine learning, unlike human learning, should be treated as copyright infringement.

No, the question is whether those genAI we have around are mass copyrights violation machines or whether they "learn" and build non-violating work.

And honestly, I have seen evidence pointing both ways. But the "copyrights protection" institutions are all quickly to decide the point dismissing any evidence on philosophical basis.

> No, the question is whether those genAI we have around are mass copyrights violation machines or whether they "learn" and build non-violating work.

I refer to the training process in question, which may or may not be be violating copyright, as "machine learning" since that's the common terminology. Question is whether that process is covered by fair use. Whether or not it actually "learn"s is not irrelevant, but I'd say more a philosophical framing than a legal one.

> I refer to the training process in question

Yeah, you go for the red herring.

All of the worthwhile debate is about the real violations. But the public discourse is surely inundated with that exact red herring.

I addressed model output (infringes copyright if substantially similar, as with manually-created works) and the process of training the model (requires collating/processing ephemeral copies, possibly fair use). What do you think the "real violations" are, if not those?
> Generally I don't think people are arguing that copyright law should be more lenient to AI than it is to humans. If your work gets ripped off (a substantially similar copy not covered by fair use) you can sue regardless of tools used in its creation.

With humans, copyright law deals with knowing and intentional infringement more severely than accidental and unintentional infringement.

With an AI, any infringement on the part of the AI end-user is very likely going to be accidental and unintentional rather than knowing and intentional, so the legal system is going to deal with it more leniently, even if actual infringement is proven. The exception would be if you deliberately prompted it to create a modified version of a pre-existing copyrighted work.

With humans, whether infringement is knowing or not, intentional or not, can turn into a massive legal stoush. Whereas, if you say it is AI output, and it appears to actually be AI output, it is going to be much harder for the plaintiff (or prosecution) to convince the court that infringement was knowing and intentional.