Hacker News new | ask | show | jobs
by NovaX 2684 days ago
Can you explain how ARC can be more easily lock-free than LRU? ARC has to coordinate 4 doubly-linked lists and a pivot on every access.
1 comments

You can implement ARC that way and there is a Go library on GitHub that does, but that is not going to scale well. Normally you won't need to evict from the frequently-used list, and there is no need to maintain LRU order on that cache during access. One need only atomically update the access time of the entry and defer ordering until such a time as eviction is required. During an access only a read lock on that cache is needed. During eviction from frequently-used you need exclusive access to all four structures, but if you can arrange for that to be rare, you'll have uncontested read access to your hot items without the need to maintain the LRU property.
That certainly helps, though you assume strict LRU vs approximate ARC. One can of course use similar schemes on LRU (like memcached does) or sampling LRU, making it approximate but reducing the lock contention. The technique that I've used is to buffer and replay events against the policy, so that reads are just a CAS into a ring buffer and replaying is under performed under a non-blocking lock. That works well with any policy and extends to other features, like expiration.
I think we'd definitely have things to discuss :-) It sounds like we agree there is rarely a good reason to maintain strict LRU property no matter what scheme or structure is being used.

BTW for expiration I have simply invalidated on next access. This seems to have worked well for me with DNS caching, for example, where basically you have three hot records (gmail.com, hotmail.com, yahoo.com) that expire every 5 seconds. In this case of course there is value in proactively refreshing cache entries with some probability proportional to the expiry time so you don't have thundering herds.

Yes, that would be fun. :)

For fixed expiration (e.g. 5 min TTL), I use time-bounded LRU queues. Then if the head item has not expired, its successors have not either. If it has, it is merely a poll.

For custom expiration (varies per-entry), I use a hierarchical timer wheel. This replaces sorting with hashing, with an amortized O(1) cost.

For refresh, that's when I do it on access since its being used. I keep a refresh timestamp and an expiration timestamp, so that a refresh can occur while the data is valid but aging, but a blocking load is required when its too stale (expired).

You might enjoy this slide deck of the ideas that I use: https://docs.google.com/presentation/d/1NlDxyXsUG1qlVHMl4vsU...