Y
Hacker News
new
|
ask
|
show
|
jobs
by
gorbypark
1207 days ago
I don't know if they did their own crawling as well, but at least part of the training set for GPT-3 was Common Crawl data. You could look up if Common Crawl respects Robots.txt or not.