| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by achrono 395 days ago

If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

And for the future, here's one heuristic: if there is a profound violation of the law anywhere that (relatively speaking) is ignored or severely downplayed, it is likely that interested parties have arrived at an understanding. Or in other words, a conspiracy.

[1] There are tons of legal arguments on both sides, but for me it is enough to ask: if this is not illegal and is totally fair use (maybe even because, oh no look at what China's doing, etc.), why did they have to resort to & foster piracy in order to obtain this?

2 comments

NitpickLawyer 395 days ago

> If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

European here, but why do you think this is so clear cut? There are other jurisdictions where training on copyrighted data has already been allowed by law/caselaw (Germany and Japan). Why do you need a conspiracy in the US?

AFAICT the US copyright law deals with direct reproductions of a copyrighted piece of content (and also carves out some leeway with direct reproduction, like fair use). I think we can all agree by now that LLMs don't fully reproduce "letter perfect" content, right? What then is the "spirit" of the law that you think was broken here? Isn't this the definition of "transformative work"?

Of note is also the other big case involving books - the one where google was allowed to process mountains of books, they were sued and allowed to continue. How is scanning & indexing tons of books different than scanning & "training" an LLM?

link

AlotOfReading 395 days ago

Google asserted fair use in that case, which is an admission of (allowed) copyright infringement. They didn't turn books into a "new form", they provided limited excerpts that couldn't replace the original usage and directly incentivized purchases through normal sales channels while also providing new functionality.

Contrast that with AI companies:

They don't necessarily want to assert fair use, the results aren't necessarily publicly accessible, the work used isn't cited, users aren't directed to typical sales channels, and many common usages do meaningfully reduce the market for the original content (e.g. AI summaries for paywalled pages).

It's not obvious to me as a non-lawyer that these situations are analogous, even if there's some superficial similarity.

link

achrono 395 days ago

Let me answer those questions with actual evidence.

To begin with, this very case of Perlmutter getting fired after her office's report is interesting enough, but let's keep it aside. [0]

First, plenty of lobbying has been afoot, pushing DC to allow training on this data to continue. No intention to stop or change course. [1]

Next, when regulatory attempts were in fact made to act against this open theft, those proposed rules were conveniently watered down by Google, Microsoft, Meta, OpenAI and the US government lobbying against the copyright & other provisions. [2]

If you still think, "so what? maybe by strict legal interpretation it's still fair use" -- then explain why OpenAI is selectively signing deals with the likes of Conde Nast if they truly believe this to be the case. [3]

Lastly, when did you last see any US entity or person face no punitive action whatsoever despite illegally downloading (and uploading) millions of books & journal articles; do you remember Aaron Swartz? [4]

You might not agree with my assessment of 'conspiracy', but are you denying there is even an alignment of incentives contrary to the spirit of the law?

[0] https://www.reuters.com/legal/government/trump-fires-head-us...

[1] https://techcrunch.com/2025/03/13/openai-calls-for-u-s-gover...

[2] https://www.euronews.com/next/2025/04/30/big-tech-watered-do...

[3] https://www.reuters.com/technology/openai-signs-deal-with-co...

[4] https://cybernews.com/tech/meta-leeched-82-terabytes-of-pira...

link

whycome 395 days ago

What’s your reading of the spirit of the law?

link