|
|
|
|
|
by eternalban
2684 days ago
|
|
I am not sure I understand what you mean by the size of the "loop". Conventional LRU uses linked-lists and when the LL no longer fits in the L1/L2 caches, you will see the performance hit you mention (given that accessing the associated LL node and reordering the list will most likely result in two cache misses). Try libclc [1] for a compact segmented cache of n 64B containers that exactly fit a cache line, with each container having its own eviction policy. The overhead of LRU per container (of 7 data lines) is 9.14 bits/entry and the same cache line access will give you both the data lines and the LRU order. [1]: https://github.com/alphazero/libclc |
|
He refers to the LRU-like policy of the I/D cache itself.