|
|
|
|
|
by xxpor
711 days ago
|
|
>Something as simple and common as a summation loop can often be sped up 2-4x by simply having multiple accumulators that you then combine after the loop body; this lets the processor "run ahead" without loop carried dependencies and execute multiple accumulations each cycle. Shouldn't the compiler be smart enough to figure that out these days (at least if it truly is a straightforward accumulation loop)? |
|