I'm sure you already know this, but if I understand hinkley's question correctly, there's one more thing to it. Modern processors may prefetch memory in chunks, greatly speeding up the sequential case: https://software.intel.com/en-us/articles/optimizing-applica...