Hacker News new | ask | show | jobs
by portInit 1199 days ago
Yeah the default domain throttle policy is 1 req per second per domain. Configurable through domain policies https://www.crul.com/docs/features/domain-policies - although currently an enterprise feature.

We found that it becomes too easy to break API request limits or spam a website otherwise.

However if you rerun that query it should load pretty instantly due to the caching layers, so the actual querying/filtering of the data part is smoother/faster.

2 comments

> due to the caching layers

Every time I see that, the "2 hardest things" springs to mind. Is there a clear-caches option, or I guess the opposite question: does that process honor the HTTP caching semantics? Scrapy actually has a bunch of configurable knobs for that (use RFC2616 Policy ( https://docs.scrapy.org/en/2.8/topics/downloader-middleware.... ), write your own policy, or a ton of other stuff: https://docs.scrapy.org/en/2.8/topics/downloader-middleware.... )

Agreed, caching does come with its own set of quirks and mind-numbing bugs, crul does have a caching override flag at the command/stage level which alleviates some of this: https://www.crul.com/docs/queryconcepts/common-flags#--cache

Your provided links are interesting and something for us think about some more. Honestly, I would be quite interested in hearing more about your experiences.

> although currently an enterprise feature.

Wait, so we literally can't go faster than 1 req/s unless we pay?

I have to say I'm pretty disappointed :/

If you attach to the running docker container, these defaults appear to be defined in /crul/dist/crul-docker/packages/startup/.env

Don't spam APIs. That said, if you're determined to do so, there's not much this or any other tool can do to stop you from trying.

Yeah exactly. Living up to your username :) nice find! Note this is a global default, unlike domain policies which are associated with a domain.
Sorry to hear that - we do need to think about this. It's our first pass at product tiers and features and we may need to adjust.

Scheduling and Domain Policies were the main features we chose to gate initially as they don't affect core functionality other than performance and deployment.