Hacker News new | ask | show | jobs
by jorvi 359 days ago
Yup. The book torrenting case is pretty nuts.

If I can reproduce the entirety of most books off the top of my head and sell that to people as a service, it's a copyright violation. If AI does it, it's fair use.

Pants-on-head idiotic judge.

6 comments

>If I can reproduce the entirety of most books off the top of my head and sell that to people as a service, it's a copyright violation. If AI does it, it's fair use.

Assuming you're referring to Bartz v. Anthropic, that is explicitly not what the ruling said, in fact it's almost the inverse. The judge said that output from an AI model which is a straight up reproduction of copyrighted material would likely be an explicit violation of copyright. This is on page 12/32 of the judgement[1].

But the vast majority of output from an LLM like Claude is not a word for word reproduction; it's a transformative use of the original work. In fact, the authors bringing the suit didn't even claim that it had reproduced their work. From page 7, "Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service." That's because Anthropic is already explicitly filtering out results that might contain copyrighted material. (I've run into this myself while trying to translate foreign language song lyrics to English. Claude will simply refuse to do this)[2]

[1] https://www.courtlistener.com/docket/69058235/231/bartz-v-an...

[2] https://claude.ai/share/d0586248-8d00-4d50-8e45-f9c5ef09ec81

They should still have to pay damages for possessing the copyrighted material. That's possession, which courts have found is copyright violation. Remember all the 12 year olds who got their parents sued back in the 2000s? They had unauthorized copies.
I don't know what exactly you're referring to here. The model itself is not a copy, you can't find the copyrighted material in the weights. Even if you could, you're allowed under existing case law to make copies of a work for personal use if the copies have a different character and as long as you don't yourself share the new copies. Take the Sony Betamax case, which found that it was legal and a transformative use of copyrighted material to create a copy of a publicly aired broadcast onto a recording medium like VHS and Betamax for the purposes of time-shifting one's consumption.

Now, Anthropic was found to have pirated copyrighted work when they downloaded and trained Claude on the LibGen library. And they will likely pay substantial damages for this. So on those grounds, they're as screwed as the 12 year olds and their parents. The trial to determine damages hasn't happened yet though.

> The model itself is not a copy,

Agreed

> the Sony Betamax case, which found that it was legal and a transformative use of copyrighted material to create a copy of a publicly aired broadcast

Good thing libgen is not publicly aired in broadcast format.

> So on those grounds, they're as screwed as the 12 year olds and their parents.

Except they have deep enough pockets to actually pay the damages for each count of infringement. That's the blood most of us want to see shed.

You cannot have trained the model without possession of copyrighted works. Which we seem to be in agreement on.

This was immediately my reaction as well, but I'm not a judge so what do I know. In my own mind I mark it as a "spice must flow" moment -- it will seem inevitable in retrospect but my simple (almost surely incorrect) take is that there just wasn't a way this was going to stop AI's progress. AI as a trend has incredible plot armor at this point in time.

Is the hinge that the tools can recall a huge portion (not perfectly of course) but usually don't? What seems even more straight forward is the substitute good idea, it seems reasonable to assume people will buy less copies of book X when they start generating books heavily inspired by book X.

But, this is probably just a case of a layman wandering into a complex topic, maybe it's the case that AI has just nestled into the absolute perfect spot in current copyright law, just like other things that seem like they should be illegal now but aren't.

I didn't see the part of the trial where they got the "entirety of most books" out of Llama. What did you see that I didn't?
Sad to say but it would have put US companies at a major disadvantage if they were not allowed to.
I'm not sure that's true. I've never heard of a human being done for copyright for reciting a book passage.

I daresay the difference with AI is that pretty much no human can do that well enough to harm the copyright holder, whereas AI can churn it out.

Yea, that dipshit judge just opened the flood gates for more problems. The problem is they don't understand how this stuff works and they're in the position of having to make a judgement on it. They're completely unprepared to do so.

Now there's precedent for future cases where theft of code or any other work of art can be considered fair use.