Hacker News new | ask | show | jobs
by ChrisLomont 2117 days ago
Matrix compile time templates like Eigen result in vastly faster code than doing it in C, since many operations can be compile time simplified. C has no way do do this at compile time.

This is just the tip of the iceberg on using templates and classes to make faster, cleaner code.

2 comments

In C, you could provide a bunch of functions that chain together the permutations of operations that can be optimized. I.e. TransverseMultiplyMatrixInverseDotProduct() or whatever actually makes sense. Since you can't overload operators, folk would have to read through the available functions to find what they need anyway. It wouldn't be pretty, but it would be functional and probably compile down to similar machine code.
No, you cannot, not without putting an incredible amount of work on the programmers plate.

Consider the simple problem of multiplying together a sequence of N matrices of possibly different sizes with the least amount of work. The order you multiply in is determined via some optimization technique. You can try to have a different C function for each N, but eventually you will have some N for which your lib doesn't have the call. Or maybe you'll try to pack pointers into an array and pass that, which is now slower and more memory costly. In any case the order must be solved at runtime.

Templates allow, at compile time for known size matrices, the order to be determined. This cannot be done in generality with C since you cannot in C do it.

And, if the matrices were constexpr, this can be computed at compile time.

So the template method, giving you Turing complete operations, can do things that you cannot do in C.

This is just a simple example, the tip of the iceberg.

I'm not disputing that you can make prettier, more scalable APIs in C++ than in C. My point is that it's not completely hopeless in C either, though. In practice, the user of a matrix math library needs to understand the operations they're doing, and especially so if they actually care about performance. In the example you gave of a string of matrix multiplications, matrix multiplication isn't commutative, so the order is the order that the programmer wrote them in. The compiler is still free to reorder and coalesce redundant calculations with sufficient inlining. Also, N is small for 99% of use cases where performance matters, and when N is large, falling back to a slower "runtime" implementation is perfectly reasonable because the runtime overhead is insignificant compared to the overall cost of the operation; eigen itself does that internally. A blanket claim that pointers are "slower" and memory costly also seems a bit overly simplified. They are usually worse than passing by value for small data sizes, but for larger data sizes, some sort of reference passing somewhere will be faster than doing unnecessary memory copies. For sufficiently large data sizes, a straight forward hand written "runtime" algorithm implementation may even happen to be faster than a compiler generated specialized equivalent depending on the hardware's memory model.

Eigen is a great library and very convenient to use. It's great to be able to write straight forward chains of matrix operations and trust that the resulting program will be reasonably fast. There's no need to be dogmatic about C vs. C++, though. They're both higher level languages targeting the same underlying hardware. Templates enable library developers to make simple APIs at the expense of more complicated library implementations. In C, it's often necessary to compromise on the simplicity of the API to achieve the same performance, but it also generally means that the library implementations are simpler. The overall quality of the resulting binary can be about the same, and is almost certainly within the same ballpark performance wise. As an embedded engineer, I often need APIs that are compatible with C whether or not the implementation is C++, and I value simple library implementations over complex ones; the libraries and my use cases are often obscure enough that they are buggy, and so the more readable the library is, the easier it is for me to debug them.

As a recent real world example, a coworker, who is a wizard that knows way more than I do about signal processing, implemented some matrix heavy algorithms in a high level language that supports just-in-time compilation down to parallelized CPU and even GPU machine code. It worked great on an x86-64 workstation, but on production hardware, we struggled to get the code to run fast enough; it would peg all the CPUs at 100%. The many layers of libraries and JIT compiliation made the system very hard to debug even after a couple weeks of trying. I suggested re-implementing the algorithm in C++ using whatever matrix library was most convenient, and a few days later the system was running perfectly and averaging 14% of one CPU. The algorithm went from maybe 50 lines of very readable code to 250 lines of relatively ugly code, but we understood what it was doing way better. I believe he used Eigen in the C++ implementation, but whether or not the matrix library was optimized at all, C, C++, or rust, it still would have sipped around 14% of one CPU. My point is that, when performance matters, you need to understand what the software and hardware is doing, and so there's value in simplicity and pragmatism.

Of course you could also just write assembly if you wanted the exact most optimized machine code.

It’s also hard to generalize these things in the form of those kinds of macros. Whereas with something like Eigen, just write your code like normal, you don’t have to worry about the special cases, and the compiler rewrites it for you. That’s one of the nice benefits, one of many, of metaprogramming.

Indeed Eigen is a popular library for its expression templates that make many math operations much faster.