| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by carterschonwald 4662 days ago

actually in the specific example I'm thinking about, i'm talking about memory locality being the performance difference (and in this case, array layout for matrix multiplication).

The naive obvious "dot product" matrix mult of two Row Major matrices is 100-1000x slower than somewhat fancier layouts, or even simply transposing the right hand matrix can make a significant difference, let alone more fancy things.

Often the biggest throughput bottleneck for CPU bound algorithms in a numerical setting is the quality of the memory locality (because the CPU can chew through data faster than you can feed it). Its actually really really hard to get C / C++ to help you write code with suitably fancy layouts that are easy to use.

Amusingly, I also think most auto vectorization approaches to SIMD actually miss the best way to use SIMD registers! I've actually some cute small matrix kernels where by using the AVX SIMD registers as a "L0" cache, I get a 1.5x perf boost!

1 comments

dikei 4661 days ago

This is like replacing the compiler optimizer algorithm with your own, similar to the method of writing critical function in Assembly, right?

Still I don't see the connection to Haskell, can you elaborate ?

carterschonwald 4660 days ago

oh, thats just me rambling about why i don't trust compiler autovectorization :)

well: 1) i've been slowly working on a numerical computing / data analysis substrate for haskell for over a year now.

2) the haskell c ffi is probably the nicest c ffi you'll ever see. Also pretty fast, basically just a known jump/funcall! And no marshaling overhead too!

3) theres a lot of current and pending over the next year work to make it easy to write HPC grade code using haskell. Some approaches involve interesting libs for runtime code gen (the llvm-general lib for haskell is AMAZING).

Theres also a number of ways where ghc haskell will likely get some great performance improvements for numerical code over the next 1-2 years! (i've a few ideas for improving the native code gen / SIMD support over the next year that I hope to experiment with as time permits)