| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sneak 339 days ago

> Some companies, like Meta, went even further. They didn’t just use web content—they also used pirated books to train their models. (The Unbelievable Scale of AI’s Pirated-Books Problem) If a regular person did this, they’d probably get into serious trouble. But when billion-dollar companies do it, they usually get away with it.

Someone should tell Anna’s Archive.

The US’s criminal enforcement is very much biased into the “rules for thee, but not for me” category, but invoking it here is a trope. Anyone can get away with piracy on the scale of Books3 or The Pile. The reason random people don’t make models is because the hardware and power costs are fucking astronomical, not because they can’t get away with downloading the training data.

These sort of hot takes are just as wrong as the breathless “AGI is right around the corner” ones.

AI is hugely transformative, and anyone who thinks it’s overhyped doesn’t know the SOTA. It will likely be the single biggest technological advancement of our lifetime.

4 comments

mossTechnician 339 days ago

Everybody knows the name of the CEO of Facebook, and the company has gotten away with manipulating public dialogue and monopolizing the social sphere for years. I don't know the name of Anna's Archive doesn't have that political capital.

link

Calavar 339 days ago

> Anyone can get away with piracy on the scale of Books3 or The Pile.

The obvious counter example would be Aaron Swartz

link

sneak 339 days ago

1. He did it non-anonymously as a form of activism, which seems like an obvious bias toward martyrdom. An argument can be made that he chose fame over effectiveness, just like Assange did.

2. We don’t know if he would have gotten away with it or not. Mental illness killed him via suicide, not the federal indictment.

There are several EXTREMELY large pirate libraries in operation presently that anyone can use. They are actively getting away with it, likely because they are explicitly staying anonymous.

link

edg5000 339 days ago

No, I think the book pirating thing is different than an individual pirating. It's like comparing genocide to murder. A few murders, not great. A few genocides? Really not cool.

This book thing at Meta is something we should never forget. It revealed how utterly broken the US is in this regard, hope they get it sorted. Without the rule of law you'll get a shit country.

link

sneak 339 days ago

Prosecutorial discretion isn’t counter to the rule of law. The state gets to selectively apply criminal penalties.

This has always been the case.

By your definition, the US has never really had the rule of law.

link

danielscrubs 339 days ago

It helps having China as a bogeyman, if WE don’t pirate then China will win the AI race, and we will all be slaves… if you have principles you are a communist…

link

brookst 339 days ago

But change and disruption are scary, and people have nearly infinite capacity for denial and self delusion.

Maybe if we all pretend AI is totally useless and will never improve, then I won’t have to worry about my job or economic value changing?

link