|
|
|
|
|
by ricardo81
2022 days ago
|
|
Hello, I work on the technical side of Mojeek. Mojeek follows the robots.txt protocol so if a site doesn't want to be crawled by MojeekBot we respect that wish. There is also a generous crawl delay between pages on the same host. Generally a 'badly behaved bot' will ignore robots.txt or hit a site too hard with requests. Our bot uses a specific user agent which you can verify via DNS. https://www.mojeek.com/bot.html |
|
What's the order of magnitude of this delay? milliseconds? hundreds of milliseconds? seconds? I'm curious what's considered 'polite' in this realm and how the various parties come to form opinions on this.