|
|
|
|
|
by rockwotj
617 days ago
|
|
I worked for a short time on SearchGPT, and I can tell you OpenAI does respect robots.txt , at least when I was there and does now. They are also careful to shard per domain and only crawl each domain at a small rate (~1 qps) as to not ddos the site. OpenAI also uses User Agent strings to identify itself: https://platform.openai.com/docs/bots They have dedicated user agents for search crawling, when a user directly asks about a site and for training data. |
|
Maybe that's their intent, but this was only a month ago: https://www.gamedeveloper.com/business/-this-was-essentially...
> "The homepage was being reloaded 200 times a second, as the [OpenAI] bot was apparently struggling to find its way around the site and getting stuck in a continuous loop," added Coates. "This was essentially a two-week long DDoS attack in the form of a data heist."