|
|
|
|
|
by nostril
1001 days ago
|
|
As far as I can tell, Books3 seems to be training data for language models. I'm not sure how it was created, but it contains a lot of books. I got it by torrenting The Pile after it was forced offline by the Danes. They're asking to remove 109 books from the dataset, which I can do. But I'm not sure whether to. Once you set aside the question of law, it becomes a matter of ethics, and these questions aren't so easy. |
|
Unless you're based and incorporated in Iran, Iraq or North Korea, your country has signed the Berne Convention and has implemented in law some level of copyright protection that almost certainly makes the distribution of those books illegal.
If you're not taking very careful technical and legal measures to remain anonymous, you can get in serious legal trouble for breaking the law.
What is the upside for you? Companies like Uber, Google, etc break the law all the time. But they profit billions from that and then pay millions in fines and lawyers. What's your game? Are you profiting enough to make sense - financially-wise - to break the law?
Last but not least, I wouldn't play with lawyers' personalities trying to make them "please" you. Respect them, otherwise, they'll do whatever they can to make you regret it. And believe me, they can do a lot against you. These people are evil. Don't cross their paths.