| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by greensoap 326 days ago
	Anthropic literally did exactly this to train its models according to the lawsuit. The lawsuit found that Anthropic didn't even use the pirated books to train its model. So there is that

2 comments

hcs 326 days ago

The lawsuit didn't find anything, Anthropic claimed this as part of the settlement. Companies settle without admission of wrongdoing all the time, to the extent that it can be bargained for.

link

ijk 326 days ago

The judge's ruling from earlier certainly seemed to me to suggest that the training was fair use.

Obviously, that's not part of the current settlement. I'm no expert on this, so I don't know the extent to which the earlier ruling applies.

link

hcs 326 days ago

If I'm reading this right yes the training was fair use, but I was responding (unclearly) to the claim that the pirated books weren't used to train commercially released LLMs. The judge complained that it wasn't clear what was actually used, from the June order https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/... [pdf]:

> Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and millions of other books was justified because all those copies were at least reasonably necessary for training LLMs — and yet Anthropic has resisted putting into the record what copies or even sets of copies were in fact used for training LLMs.

> We know that Anthropic has more information about what it in fact copied for training LLMs (or not). Anthropic earlier produced a spreadsheet that showed the composition of various data mixes used for training various LLMs — yet it clawed back that spreadsheet in April. A discovery dispute regarding that spreadsheet remains pending.

link

rise_before_sun 326 days ago

Thanks for this info. I was looking for which pirated books were used for which model.

Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?

To me, the taint still remains. Which is a shame, because it's considered the best coding model so far.

link

heavyset_go 326 days ago

> Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?

No, it part because it removes agency from the authors/rightsholders. Maybe they don't want to sell Anthropic their books, maybe they want royalties, etc.

link

jack_pp 326 days ago

Can authors even claim such rights though? I doubt think they even had such agency to begin with

link

freejazz 323 days ago

They stated it in court in their papers for summary judgment on the issue of fair use. My gosh! To pretend like you know what you're talking about but missing that detail?

link

phillipcarter 326 days ago

I'm "team Anthropic" if we're stack ranking the major American labs pumping out SOTA models by ethics or whatever, but there is no universe in which a company like them operating in this competitive environment didn't pirate the books.

link

Finbel 326 days ago

"ethics or whatever" seem like a good tagline for people rooting for an AI-company when it's being sued by authors.

link

godelski 326 days ago

Makes sense why Effective Altruism is so popular. Commit crime, make billions, give back when dead, live guilt free?

link

IshKebab 326 days ago

Except for Google at least.

link