| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pbsd 4863 days ago

Figure 7 shows Haskell-generated AVX instructions, albeit only using the lower 128 bits. That code would not run on an SSE4.2-capable Nehalem, for instance.

There are some other CPU-related slight inaccuracies in the paper. Prefetching is repeatedly mentioned, even though its effect is negligible when one has a perfectly linear memory access pattern; unaligned loads are mentioned as a performance hit, but they are essentially free on the test processor (2600k, Sandy Bridge).

Matrix multiplication would perhaps be a better example to show the power of clever prefetching.