Hacker News new | ask | show | jobs
by brianjking 1029 days ago
I mean they literally wrote their own crawler which has docs for it. I'm sure they'll respect it. https://platform.openai.com/docs/gptbot

What isn't known is if those same sites will be also included in corpuses such as CommonCrawl or ThePile, leading to being included in training as is.