Hacker News new | ask | show | jobs
by stephencanon 4707 days ago
If you had 20 similar functions, the tables would occupy 5k in total, using only 1/6th of the L1 D$ on a typical "big" CPU. In actuality, temporal locality is such that you don't often stride through all table entries uniformly, so the actual cache pressure is even less.

The point that you're going after is a good one, but its important to keep in mind how enormous modern memory hierarchies are. It often is very reasonable to trade memory and cache pressure for speed.