Hacker News new | ask | show | jobs
by ChrisLomont 2113 days ago
No, you cannot, not without putting an incredible amount of work on the programmers plate.

Consider the simple problem of multiplying together a sequence of N matrices of possibly different sizes with the least amount of work. The order you multiply in is determined via some optimization technique. You can try to have a different C function for each N, but eventually you will have some N for which your lib doesn't have the call. Or maybe you'll try to pack pointers into an array and pass that, which is now slower and more memory costly. In any case the order must be solved at runtime.

Templates allow, at compile time for known size matrices, the order to be determined. This cannot be done in generality with C since you cannot in C do it.

And, if the matrices were constexpr, this can be computed at compile time.

So the template method, giving you Turing complete operations, can do things that you cannot do in C.

This is just a simple example, the tip of the iceberg.

1 comments

I'm not disputing that you can make prettier, more scalable APIs in C++ than in C. My point is that it's not completely hopeless in C either, though. In practice, the user of a matrix math library needs to understand the operations they're doing, and especially so if they actually care about performance. In the example you gave of a string of matrix multiplications, matrix multiplication isn't commutative, so the order is the order that the programmer wrote them in. The compiler is still free to reorder and coalesce redundant calculations with sufficient inlining. Also, N is small for 99% of use cases where performance matters, and when N is large, falling back to a slower "runtime" implementation is perfectly reasonable because the runtime overhead is insignificant compared to the overall cost of the operation; eigen itself does that internally. A blanket claim that pointers are "slower" and memory costly also seems a bit overly simplified. They are usually worse than passing by value for small data sizes, but for larger data sizes, some sort of reference passing somewhere will be faster than doing unnecessary memory copies. For sufficiently large data sizes, a straight forward hand written "runtime" algorithm implementation may even happen to be faster than a compiler generated specialized equivalent depending on the hardware's memory model.

Eigen is a great library and very convenient to use. It's great to be able to write straight forward chains of matrix operations and trust that the resulting program will be reasonably fast. There's no need to be dogmatic about C vs. C++, though. They're both higher level languages targeting the same underlying hardware. Templates enable library developers to make simple APIs at the expense of more complicated library implementations. In C, it's often necessary to compromise on the simplicity of the API to achieve the same performance, but it also generally means that the library implementations are simpler. The overall quality of the resulting binary can be about the same, and is almost certainly within the same ballpark performance wise. As an embedded engineer, I often need APIs that are compatible with C whether or not the implementation is C++, and I value simple library implementations over complex ones; the libraries and my use cases are often obscure enough that they are buggy, and so the more readable the library is, the easier it is for me to debug them.

As a recent real world example, a coworker, who is a wizard that knows way more than I do about signal processing, implemented some matrix heavy algorithms in a high level language that supports just-in-time compilation down to parallelized CPU and even GPU machine code. It worked great on an x86-64 workstation, but on production hardware, we struggled to get the code to run fast enough; it would peg all the CPUs at 100%. The many layers of libraries and JIT compiliation made the system very hard to debug even after a couple weeks of trying. I suggested re-implementing the algorithm in C++ using whatever matrix library was most convenient, and a few days later the system was running perfectly and averaging 14% of one CPU. The algorithm went from maybe 50 lines of very readable code to 250 lines of relatively ugly code, but we understood what it was doing way better. I believe he used Eigen in the C++ implementation, but whether or not the matrix library was optimized at all, C, C++, or rust, it still would have sipped around 14% of one CPU. My point is that, when performance matters, you need to understand what the software and hardware is doing, and so there's value in simplicity and pragmatism.