|
|
|
|
|
by dikei
4614 days ago
|
|
It's not neccesarily which language is faster, but the which algorithm is faster.
He said naively written C, which mean the algorithm may be entirely different and run in O(n2) and much slower than a Haskell version which use a different algorithm and run in O(nlogn). |
|
The naive obvious "dot product" matrix mult of two Row Major matrices is 100-1000x slower than somewhat fancier layouts, or even simply transposing the right hand matrix can make a significant difference, let alone more fancy things.
Often the biggest throughput bottleneck for CPU bound algorithms in a numerical setting is the quality of the memory locality (because the CPU can chew through data faster than you can feed it). Its actually really really hard to get C / C++ to help you write code with suitably fancy layouts that are easy to use.
Amusingly, I also think most auto vectorization approaches to SIMD actually miss the best way to use SIMD registers! I've actually some cute small matrix kernels where by using the AVX SIMD registers as a "L0" cache, I get a 1.5x perf boost!