Hacker News new | ask | show | jobs
by vhcr 1167 days ago
The training on GPT was done on Common Crawl, Reddit, books, and Wikipedia.

For Common Crawl, the documentation says blocking it on robots.txt should work, as for Wikipedia, Reddit, and books, there's no option than to not participate AFAIK.

OpenWebText2 has no mention of robots.txt, so good luck with that.