Hacker News new | ask | show | jobs
by mdaniel 1199 days ago
> due to the caching layers

Every time I see that, the "2 hardest things" springs to mind. Is there a clear-caches option, or I guess the opposite question: does that process honor the HTTP caching semantics? Scrapy actually has a bunch of configurable knobs for that (use RFC2616 Policy ( https://docs.scrapy.org/en/2.8/topics/downloader-middleware.... ), write your own policy, or a ton of other stuff: https://docs.scrapy.org/en/2.8/topics/downloader-middleware.... )

1 comments

Agreed, caching does come with its own set of quirks and mind-numbing bugs, crul does have a caching override flag at the command/stage level which alleviates some of this: https://www.crul.com/docs/queryconcepts/common-flags#--cache

Your provided links are interesting and something for us think about some more. Honestly, I would be quite interested in hearing more about your experiences.