Hacker News new | ask | show | jobs
by rossjudson 4845 days ago
Well, no. A linear scan over a large memory array is going to crap all over the CPU caches if you have to do it more than once.

Break into blocks < CPU cache size, perform multiple stages on each block.

Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth.