Hacker News new | ask | show | jobs
by kazinator 2684 days ago
> When Are Randomized Algorithms Better Than LRU?

When the access pattern is random: recently used items are not more likely to be accessed than other items.

Or worse: the access pattern is "anti-recent": items recently accessed are less likely to be accessed again than other items.

This is undergraduate stuff I once had to know for exams in machine architectures and operating systems.

The choice of LRU replacement is favorable when there exists "locality of reference"; that's what it's predicated upon.

All of the replacement policy choices are only substitutes for a "magic oracle": a replacement strategy which can peer into the future and know which items will be accessed soonest. With a magic oracle, we can not only retain those items in the cache in preference to other items, to great advantage, but we can automatically prefetch the correct items that are not yet in the cache but are about to be accessed.

In general, randomized algorithms provide a defense against situations in which things don't work out according to the best case assumptions. For instance, they help defend against tickling worst cases in algorithms (e.g. sorting) that have good average-case behavior.

3 comments

This comment is dangerously trivializing the problem. Even in cases where the access pattern is contiguous, not "anti-recent", etc. LRU can still fail to perform. For example, if you simply iterate over a contiguous list of 5 items with a cache size of 4, you will have a 0% hit rate with LRU. With 2 random, it would be > 0% because there is a chance you wouldnt evict the item in exactly the wrong time. The main takeaway from the article is that it is this mismatch between cache sizes which can cause problems and is evident by the data. In L1+L2 caches, LRU outperforms, but in L3 2-random outperforms because at that point, you are more likely to hit this exact case.
Repeatedly iterating over a sequence items is in fact "anti-recent". Round-robin cycling is probably the simplest possible algorithm for accessing a set of items such that, at any time, the item that is accessed next is the one that was least recently visited. Proof: since that item was last visited, every other item in the set was visited.
You're missing exactly the point. "anti-recent" is a relative term. Imagine that the data set is actually 1000 items long, but we iterate through it five at a time, but we iterate over that five 10 times each before moving on to the next 5. Those five items are much more likely to be necessary than all the 995 other items, but the currently added member is less likely than the other 4 for the next loop. You see, its a relative term. If the cache size had been 5, we would say that this is a nearly perfect example of caching the most likely members to appear.
Your loop example is more akin to the "anti-recent" pattern: the most recently accessed item will not be accessed again sooner than any of the four other items. Contiguous patterns may well be anti-recent, whether an access is "recent" or not should be considered in relation to the size of the cache.
>>When Are Randomized Algorithms Better Than LRU?

>When the access pattern is random: recently used items are not more likely to be accessed than other items.

Wouldn't they perform equally well in this case? There's no advantage in LRU, but also no disadvantage if they're really unpredictably random.

A random strategy has less overhead than LRU because there is no bookkeeping.
This argument does not apply to the OP as it does not account for any bookkeeping. I think an actual answer is more complicated and will entirely depend on the definition of “random access pattern.”
Yes, and bookkeeping and randomness both cost something.
Even when the access pattern is not random, random can provide some benefits.

It can prevent any otherwise harmless sweep/scan of the system from completely busting your entire cache for all of the other uses.