Hacker News new | ask | show | jobs
by dsharlet 1533 days ago
If you are looking for something like this in C++, here's my attempt at implementing it: https://github.com/dsharlet/array#einstein-reductions

It doesn't do any automatic optimization of the loops like some of the projects linked in this thread, but, it provides all the tools needed for humans to express the code in a way that a good compiler can turn it into really good code.

1 comments

How is "This is 40-50x faster than a naive C implementation of nested loops on my machine, " that possible/Do you know how this was achieved?
Not OP, but from a cursory glance at the code, it seems to be achieved with the combination of splitting the matrices into chunks to fit them in the L1/L2 caches (line 2 in the code), using tricks like switching indexes to achieve better cache locality, and using SIMD + Fused Multiply Add to further speedup things
Thanks! Which file are you refering to?
The code right above the assembly.