Hacker News new | ask | show | jobs
by gameshot911 500 days ago
Critically, by torrenting they also directly distributed the copywritten material itself. That is a standalone infringement separate from any argument about trained LLMs.
2 comments

They could have only leached and refrained from sharing any part of copyrighted data. If i were to commit something as risky as this, that is what i would do.
Then it would need to be determined, whether that is the case or not. Did every single machine they used have the configuration for only leeching and no seeding? The company is liable for what its employees on the job. If only one employee was also seeding ... that could be a very interesting case.
> Did every single machine they used have the configuration for only leeching and no seeding?

I would certainly assume so. It's incredibly obvious that's what you would want to do from a legal standpoint.

> If only one employee was also seeding ... that could be a very interesting case.

The torrenting wouldn't be done casually by employees acting on their own. And it's not like multiple employees are doing it simultaneously, unsupervised, on their personal computers.

This is part of an official project. They'd spin up a machine just to download the torrent, being careful to disable seeding.

This is Meta. They have lawyers involved and advising. This isn't a teenager who doesn't fully understand how torrenting works.

Did you not read the article? There are quotes from Meta employees doing exactly what you claim they wouldn't do.

> This is part of an official project. They'd spin up a machine just to download the torrent, being careful to disable seeding.

From the article:

> "Torrenting from a corporate laptop doesn’t feel right," Nikolay Bashlykov, a Meta research engineer, wrote in an April 2023 message, adding a smiley emoji. In the same message, he expressed "concern about using Meta IP addresses 'to load through torrents pirate content.'"

You also claim they would be "careful to disable seeding" but we know they did in fact seed (and anyone who uses private trackers knows they couldn't get away with leeching for very long before being kicked off):

> Meta also allegedly modified settings "so that the smallest amount of seeding possible could occur," a Meta executive in charge of project management, Michael Clark, said in a deposition.

Seeding can be trivially faked to trackers.

https://github.com/slundi/RatioUp

https://github.com/anthonyraymond/joal

http://ratiomaster.net/

The smallest amount of seeding possible would be metadata, presumably not subject to copyright.

And punishing them in the normal manner will be an incredibly small slap on the wrist, and do absolutely nothing to help us find out what will play out in court regarding a fair-use defense on training AI with copyrighted material.
Isn't there a "fruit of the poisoned tree" kind of thing? Sounds to me quite similar to the situation where you would murder your parent and get to keep the inheritance, even if you are convicted of murder. Inheriting stuff isn't illegal, yet, I think most jurisdictions would not allow you to keep it in this case.

There should be a problem with stuff obtained through illegal means, even if having that stuff is in principle legal. In this case, copyrighted material.

Obviously they would argue that having the data is only a consequence of the download part, and that part is legal. What I see is that these situations are always complicated, and if you're rich enough, you get to litigate the complications and come out with a slap on the wrist or maybe even clean hands, while if you are an ordinary citizen, you can't afford to delve into the complexities and get punished.

These days I'm starting to give up on the whole concept of the legal system being fair. They're not even pretending anymore.