Hacker News new | ask | show | jobs
by maplethorpe 64 days ago
Question: if I disallow all of OpenAI's crawlers, do they detect this and retroactively filter out all of my data from other corpuses, such as CommonCrawl?

The fact is my data exists in corpuses used by OpenAI before I was even aware anyone was scraping it. I'm wondering what can be done about that, if anything.