|
|
|
|
|
by portInit
1199 days ago
|
|
Yeah the default domain throttle policy is 1 req per second per domain. Configurable through domain policies https://www.crul.com/docs/features/domain-policies - although currently an enterprise feature. We found that it becomes too easy to break API request limits or spam a website otherwise. However if you rerun that query it should load pretty instantly due to the caching layers, so the actual querying/filtering of the data part is smoother/faster. |
|
Every time I see that, the "2 hardest things" springs to mind. Is there a clear-caches option, or I guess the opposite question: does that process honor the HTTP caching semantics? Scrapy actually has a bunch of configurable knobs for that (use RFC2616 Policy ( https://docs.scrapy.org/en/2.8/topics/downloader-middleware.... ), write your own policy, or a ton of other stuff: https://docs.scrapy.org/en/2.8/topics/downloader-middleware.... )