Hacker News new | ask | show | jobs
by hinkley 497 days ago
Years ago I encountered a caching system that I misremembered as being a plugin for nginx and thus was never able to track down again.

It had a clever caching algorithm that favored latency over bandwidth. It weighted hit count versus size, so that given limited space, it would rather keep two small records that had more hits than a large record, so that it could serve more records from cache overall.

For some workloads the payload size is relatively proportional to the cost of the request - for the system of record. But latency and request setup costs do tend to shift that a bit.

But the bigger problem with LRU is that some workloads eventually resemble table scans, and the moment the data set no longer fits into cache, performance falls off a very tall cliff. And not just for that query but now for all subsequent ones as it causes cache misses for everyone else by evicting large quantities of recently used records. So you need to count frequency not just recency.

2 comments

For every caching algorithm you can design an adversarial workload that will perform poorly with the cache. Your choice of caching algorithm/strategy needs to match your predicted workload. As you're alluding there's also the question of which resource are you trying to optimize for, if you're trying to minimize processing time that might be a little different than optimizing for bandwidth.
If you have to refetch on a cache miss you're going to be doing both. But all optimizations are always playing with the trigraph of cpu time, memory, and IO (with the hidden fourth dimension of legibility), so I don't think you're saying anything that can't be assumed as given. Even among people who tend to pick incorrectly, or just lose track of when the situation has changed.
I understood the OP to have said something along the lines of if we have a fixed cost per object then we should bias towards smaller objects if we want to minimize that cost.

And totally legibility and/or simplicity. I'll take something I can reason about and maintain over something more complicated just to eek out a tiny better hit ratio. That said, if you're caching at scale your 0.05% hit ratio can be a big deal.

As a matter of personal taste/opinion I also shy away from close loop systems. Feedback makes things complicated in non-intuitive ways. Caffeine seems neat in terms of using feedback to adjust the cache to the workload - as always test with your workloads and pick what is best for your situation.

The thing is that untangling the logic often reveals either new feature opportunities that would have been ugly to implement previously, or new optimization opportunities because the code is clearer, and possibly two things that seemed unrelated before are now obviously related.

If I can't figure out how to make code faster without cheesing it I'll just follow the campsite rule and hope for inspiration (the more you work with code the more you understand it, and I might as well clean while I'm here)

You might be interested in this thread [1] where I described an idea for how to incorporate the latency penalty into the eviction decision. A developer even hacked a prototype that showed promise. The problem is that there is not enough variety in the available trace data to be confident that a design isn't overly fit to a particular workload and doesn't generalize. As more data sets become available it will become possible to experiment with ideas and fix unexpected issues until a correct, simple, elegant design emerges.

[1] https://github.com/ben-manes/caffeine/discussions/1744

I'll check it out.

The thing is that the math changes when the system is saturated. The closer you get to tipping over, the more each new request costs. I feel like I can clearly recall times when I had to make a Sophie's Choice between p50 and p95 times because of issues of this sort.