After, reading your comment. I cross checked result and found out that 3 is most optimal size. I even ran fib(40) on it. with size 2, There are many misses but after 3 and on wards misses are constant(if fib(40), then misses are only 40 which emulates DP approach of O(N)).
Why 3 is optimal, because of how recursion and LRU work. I wish i can explain it using animation.
It makes sense. f(n) depends on f(n-1) and f(n-2). So if the cache is able to produce these 2 values, you basically get the linear algorithm from the article. I assume the running f(n) also takes up a cache slot, hence 3 instead of 2.
If this theory is correct, every recursive function f(n) requiring access to f(n-x) should have x+1 as maximum usefull cache size.
I think it's likely that the author's analysis of why 5 is the optimal cache size heavily depends on the fact that they only tried fib(10). For example, it's clear that cache 20 and cache 30 are identical if your universe of possible inputs is only [0 .. 10], haha.
Because of the way the recursion goes down the tree depth-first, I think 5 basically cuts down the computation to fib(10-5) = fib(5) which is pretty manageable, so the author couldn't really see any further measurable performance gains by increasing the cache further. I think for fib(35) it'd be clear that cache size 35 would help compared to cache size 5. (I picked 35 instead of, say, 300, because I think 300 would just not finish with cache size 5, it'd take forever haha.)
Why 3 is optimal, because of how recursion and LRU work. I wish i can explain it using animation.
You can play with it. https://repl.it/repls/NocturnalIroncladBytecode#main.py