|
|
|
|
|
by leereeves
38 days ago
|
|
It could be written more clearly but I think when it refers to a 4x and a 3x slowdown, it's actually a 4x slowdown and 3x larger code that causes cache misses, and the impact of those cache misses on runtime is surely much larger than 3x. > Each individual iteration: ~4x slower (register spilling) > Cache pressure: ~2-3x additional penalty (instructions don't fit in L1/L2 cache) > Combined over a billion iterations: 158,000x total slowdown I think that "2-3x additional penalty" refers to this: > The 2.78x code bloat means more instruction cache misses, which compounds the register spilling penalty. Also, the analysis refers elsewhere to other factors that weren't included in this part. |
|