Hacker News new | ask | show | jobs
by gorbypark 1207 days ago
I don't know if they did their own crawling as well, but at least part of the training set for GPT-3 was Common Crawl data. You could look up if Common Crawl respects Robots.txt or not.