|
|
|
|
|
by twoodfin
4707 days ago
|
|
The article makes the point that the lookup table version is fast because the table fits in D$, and that if the table were evicted it would be slower. This is true, but the more interesting point is that by loading this table into D$, you're potentially slowing down other operations. It's an important conundrum of optimization that if you had 20 similarly complex functions in a critical path, implementing and benchmarking each individually with a lookup table could show excellent performance while globally performance is terrible. And worse, it's uniformly terrible, with no particular function seeming to be consuming an inordinate amount of the runtime or, for that matter, D$ misses. |
|
The point that you're going after is a good one, but its important to keep in mind how enormous modern memory hierarchies are. It often is very reasonable to trade memory and cache pressure for speed.