Hacker News new | ask | show | jobs
by thrownaway2424 3934 days ago
That's kinda weird reasoning. Are you saying there's no benefit to an improvement of median latency, if the tail latency remains long? I would disagree. I also would point out that not all systems that can benefit from a cache are latency-sensitive.
2 comments

Not that there's no benefit, but just that it's more complicated in a distributed system.

Certainly caching is vital to many distributed systems, but it has to be done from a systems perspective. In my experience a lot of caches are just slapped on top of individual components without much thought, and without even some basic monitoring of what the hit rate is. I think it helps to actually measure what the cache is doing for you -- but this is more work than adding the cache itself.

And I agree with another poster in that I've seen many systems with caches papering over severe and relatively obvious performance problems in the underlying code.

I was thinking of this Google publication which outlines some problems with latency variability: http://www.barroso.org/publications/TheTailAtScale.pdf

Interestingly they didn't seem to list caches as one of the causes; they list shared resources, cron jobs, queuing, garbage collection, power saving features, etc.

I read some scribbling by some nerd working on distributed systems. The problem he mentioned is when you take a task and parallelize it, and then hand off the pieces to a bunch of workers, you aren't done until the last worker finishes. In that case long tail latencies can bite you rather hard. If 99 out of a hundred workers finish their bit in 50-100us and one of them stalls out for 10ms, you gained nothing over a single worker.