Hacker News new | ask | show | jobs
by heisjustsosmart 1407 days ago

   Recency-only algorithms are blind to high-probability patterns 
   in the same way frequency-only algorithms
A high probability occurrence must be frequent and will stay reasonably recent. Unless it's high probability that just /somehow/ doesn't happen much.

Perhaps you can provide some citations to some of the things you have written here.

   the cache replacement algorithm is being 
   exercised upwards of 10s of millions of times per second
This is database pages not CPU cache lines. It is not that high.
2 comments

Many databases ordinarily modify the page cache 10 million times per second under load. I have one doing that right now on an i3en.12xlarge, which is only half a server. Not every cache replacement algorithm family has that kind of throughput but some common algorithm families do e.g. clock-sweep variants.

The "high-probability" refers to a type of pattern, not a specific instance, so there is no implication of recency. A page access pattern can be detected frequently but involve different pages each time.

Bélády's optimality theorem means cache replacement algorithm performance is governed by the theoretical limits on sequence prediction. You can optimize for either breadth or depth; improving performance for one pattern necessarily reduces performance for others. In practice, database caches optimize heavily for a few patterns and ignore the rest; the performance for the patterns they ignore is not much worse than if they optimized for all patterns equally, so it is a good trade.

What about MRU-biased workloads? For example an analytical loop or row scanning in a db query? In those cases recency is a negative signal and there may not be much frequency to extract (e.g. equal frequencies in a loop). Instead the policy has to avoid admissions to obtain some hits rather than be flushed for zero hits.
Think you mean round robin, yes, that's an issue but that's equal probability, not high probability (his phrase). I don't know a scheduler that will handle that as well as handling LRU replacement well and if you have any cites I would appreciate them.

Also remember that many DBs are multi-user so any access patterns from each connection will get thrown in the same bucket.

It is handled by W-TinyLFU where I use this scenario as a stress test [1]. While LIRS / LIRS2 did not pass this one scenario, the quality of the work behind that policy leads me to believe it could be improved to handle that case.

[1] https://github.com/ben-manes/caffeine/wiki/Efficiency#adapti...