|
|
|
|
|
by pbsd
4815 days ago
|
|
Figure 7 shows Haskell-generated AVX instructions, albeit only using the lower 128 bits. That code would not run on an SSE4.2-capable Nehalem, for instance. There are some other CPU-related slight inaccuracies in the paper. Prefetching is repeatedly mentioned, even though its effect is negligible when one has a perfectly linear memory access pattern; unaligned loads are mentioned as a performance hit, but they are essentially free on the test processor (2600k, Sandy Bridge). Matrix multiplication would perhaps be a better example to show the power of clever prefetching. |
|