|
|
|
|
|
by rossjudson
4845 days ago
|
|
Well, no. A linear scan over a large memory array is going to crap all over the CPU caches if you have to do it more than once. Break into blocks < CPU cache size, perform multiple stages on each block. Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth. |
|