|
|
|
|
|
by squigz
752 days ago
|
|
I don't think that's how robots.txt or scraping really works. Do you expect scrapers to announce every bot they run? Do you expect webmasters to add a robots rule for every bot? If someone didn't want OpenAI or anyone else scraping their site, whether OpenAI or anyone else announces they're scraping doesn't matter, if they respect robots.txt, and you have rules to catch unannounced scrapers. |
|
The difference between this scraper and other scrapers is that normally, scrapers are usually used for personal or nefarious purposes.
The data scraped for AI models is used explicitly for a commercial purpose by a commercial entity and the original creator received zero compensation or notice that their work was going to be used in a commercial product. The actual rights holders of the works that were used in an unauthorized manner have no way to seek compensation or removal of their work from this commercial product.
There is little material difference between this behavior and if someone downloaded your site and used its content in a book they were selling. It doesn't matter that you discovered this book was printed two years ago. Your work is still being used without your permission.
When the little guy does it, that's called piracy and theft. When billion dollar corporate entities do it, it's called a technological marvel.