Hacker News new | ask | show | jobs
by eternalban 2684 days ago
I am not sure I understand what you mean by the size of the "loop".

Conventional LRU uses linked-lists and when the LL no longer fits in the L1/L2 caches, you will see the performance hit you mention (given that accessing the associated LL node and reordering the list will most likely result in two cache misses).

Try libclc [1] for a compact segmented cache of n 64B containers that exactly fit a cache line, with each container having its own eviction policy. The overhead of LRU per container (of 7 data lines) is 9.14 bits/entry and the same cache line access will give you both the data lines and the LRU order.

[1]: https://github.com/alphazero/libclc

1 comments

> Conventional LRU uses linked-lists and when the LL no longer fits in the L1/L2 caches

He refers to the LRU-like policy of the I/D cache itself.

Got it. Thanks for the clarification.