|
|
|
|
|
by kevincox
1255 days ago
|
|
FWIW I don't think this really fits into robots.txt. That file is mostly aimed at crawlers. Not for services loading specific URLs due to (sometimes indirect) user requests. ...but as a place that could hold a rate limit recommendation it would be nice since it appears that the Git protocol doesn't really have the equivalent of a Cache-Control header. |
|
A crawler has a list of resources it periodically checks to see if it changed, and if it did, indexes it for user requests.
Contrary to this totally-not-a-crawler, with its own database of existing resources, that periodically checks if anything changed, and if it did, caches content and builds chescksums.