|
|
|
|
|
by okl
2684 days ago
|
|
> If you have a tight loop, LRU is going to be perfect as long as the loop fits in cache, but it's going to cause a miss every time if the loop doesn't fit. A random eviction policy degrades gracefully as the loop gets too big. That's the key take-away from my experience with optimizing control loops for microcontrollers with small caches. As soon as your loop is too bit, your code suddenly runs at a hundredth of the speed. |
|
Conventional LRU uses linked-lists and when the LL no longer fits in the L1/L2 caches, you will see the performance hit you mention (given that accessing the associated LL node and reordering the list will most likely result in two cache misses).
Try libclc [1] for a compact segmented cache of n 64B containers that exactly fit a cache line, with each container having its own eviction policy. The overhead of LRU per container (of 7 data lines) is 9.14 bits/entry and the same cache line access will give you both the data lines and the LRU order.
[1]: https://github.com/alphazero/libclc