| The article is very optimistic about memory availability per cycle, reality is way worse. As an example, on my Macbook Air 2011 with ~10 GB/s of maximum ram bandwidth, random access to memory can take 100 time more than a sequential one. This in C, with full optimizations and using a very low overhead read loop. Using the same metrics of the author: best case: ~ 3 bytes per cycle (around 6 Gigabyte per second of available bandwidth) worst case: ~ 0.024 bytes per cycle
(every scheduler, prefetch, already open column mostly defied) Note that worst case uses 10 seconds (!) to read and sum in a random way all the cells of an array of 100.000.000 of 4 byte integers, exactly once. Main loop is light enough not to influence the test. That's about 40 megabytes per second out of 6.000 available. What can I say.. CPU designers are truly wizards! |