|
|
|
|
|
by JohnFen
1041 days ago
|
|
> blocking GPTBot will not guarantee that a site's data does not end up training all AI models of the future. Aside from issues of scrapers ignoring robots.txt files, there are other large data sets of scraped websites (such as The Pile) that are not affiliated with OpenAI. This is why I'm not reassured. robots.txt isn't sufficient to stop all webcrawlers, so there every reason to think it isn't sufficient to stop AI scrapers. I'm still wanting to find a good solution to this problem so that I can open my sites up to the public again. |
|