Hacker News new | ask | show | jobs
by maxloh 241 days ago
Big Tech companies crawl the internet for training data, which makes it easy for them to download copyrighted data by accident.

For example, most popular textbooks have at least several pirate copies uploaded to the web. Some of them are even in plain sight and Googleable.