| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Octoth0rpe 323 days ago
	Given Meta's history of torrenting every book it could get its hands on for training, I'm not convinced that the majority of AI companies would respect that license. Maybe if we also had a better way to prove that such code was part of the training set and see a couple of solid legal victories with compensation awarded.

2 comments

bayindirh 323 days ago

I'm pretty astounded that "The Stack" at least did and effort, and continue to do so by weeding out GPL or similar strong copyleft source code from their trove, and even implemented an opt-out mechanism [0].

They look like saints when compared to today's companies.

[0]: https://huggingface.co/spaces/bigcode/in-the-stack

link

immibis 323 days ago

They're also getting sued for it, and the judge ruled they had no right to torrent those books so now it's just a matter of calculating how many trillions Meta has to pay, then extracting it from them.

link

Octoth0rpe 323 days ago

Because Meta got caught. I'm not convinced that every random OSS lib will have the resources to audit every model out there for a hypothetical GPL+no training violation.

link