|
|
|
|
|
by vhcr
1167 days ago
|
|
The training on GPT was done on Common Crawl, Reddit, books, and Wikipedia. For Common Crawl, the documentation says blocking it on robots.txt should work, as for Wikipedia, Reddit, and books, there's no option than to not participate AFAIK. OpenWebText2 has no mention of robots.txt, so good luck with that. |
|