Hacker News new | ask | show | jobs
by deepfriedbits 21 days ago
It's not about the paywall in this case. It's to prevent AI companies from scraping a publication's archives for training data. If AI companies want that data, they can compensate publishers, not extract it for free from the Internet Archive.
1 comments

Yes, it's probably cheaper to just download the newspaper articles from Internet Archive than to buy them directly from newspapers. Training costs minimization, or should we call it stealing?