Hacker News new | ask | show | jobs
by superfrank 54 days ago
I don't have the exact ruling in front of me, but IIRC the judge pretty clearly said that training a model was fair use. IIRC, he declared it "quintessentially transformative".

The case by case basis was about acquisition and possession of the copyrighted material. Anthropic pirated a large number of books and illegally stored digital copies of many that they did purchase legally. The training being protected doesn't give them the right to violate copyright in that way.

Google, for example, purchased print versions of their training material and had a small army of employees digitize them and then delete the digital copies when they were done. That hasn't been challenged AFAIK, but would likely have been found to be not a violation. That's I think what was meant by case by case basis.

It's like if someone breaks into my house and I shoot them with my gun, that's very likely self defense, but if I'm not allowed to own a gun, I may still end up in trouble with the law.

1 comments

Whether or not you’re pirating and making illegal copies of something depends greatly on the terms under which you’re allowed to make those copies. You can copy GPL-licensed code all day every day so long as you abide by the license. The same is true of the BSD licenses, MIT, ISC, Apache, et cetera.

If you’re copying or making substantially derivative works of them outside the terms of the license, you’re violating the copyright.

> If you’re copying or making substantially derivative works of them outside the terms of the license, you’re violating the copyright

I don't disagree with that.

What I'm saying is that the judge ruled that training a model using copyrighted books wasn't derivative. It was transformative, so the training wasn't a copyright violation.

He then went on to say that the way Anthropic acquired and handled that material was a copyright violation because Anthropic pirated and copied a large number of books that were not under a license like the ones you mentioned. The downloaded a bunch of books you would find at most bookstores and then actually purchased copies of them much later once they were accused of violation copyrights.

I'm just trying to make that clear because I've heard a lot of people who don't understand that the violation wasn't about the act of training or material they used, it was just how they acquired the training material.

That was one case in front of one judge. It’s weak precedent if it’s precedent at all.

Also, the reasoning behind it being transformative instead of derivative is that the output isn’t supposed to be large, unchanged chunks of the input. There’s no actual guarantee your small model run under OpenClaw won’t recreate whole modules of the input.