| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by onalark 4432 days ago

If you want/need maximum performance, why are you programming in a "high-level language"?

Many of the highest-performance scientific computing libraries are written in assembly or use inline assembly instructions. The vast majority of scientific programmers don't need to write at this level. I challenge you to write a native C++ matrix-matrix-multiply algorithm that generally outperforms the matrix-matrix-multiply available to MATLAB, Python/Cython, and Julia through their interface to the assembly-optimized OpenBLAS library: http://julialang.org/

Note that C, Fortran, and Julia are all using OpenBLAS in the rand_mat_mul benchmark, and OpenBLAS is written in C (with assembly).

2 comments

adwn 4432 days ago

> If you want/need maximum performance, why are you programming in a "high-level language"?

Because going to assembly offers so little benefit (if at all), that it doesn't justify the huge increase in effort and susceptibility to bugs in most cases (and even then, inline assembly is usually sufficient).

> Many of the highest-performance scientific computing libraries are written in assembly or use inline assembly instructions.

I don't think this is true, at least not for non-inline assembly. Sure, OpenBlAS contains a lot of assembly, but many scientific computing algorithms are more complex than the BLAS routines.

> I challenge you to write a native C++ matrix-matrix-multiply algorithm that generally outperforms the matrix-matrix-multiply available to [...]

Where in my post did I give you the idea that I don't like code reuse and instead reinvent the wheel every time?

link

onalark 4432 days ago

edit: Jack Poulson's Elemental, and Andreas Waechter's IPOPT are also written in C++, and are both very important libraries within scientific computing and optimization, to add a few more examples of the usefulness of C++.

I'm trying to make the point that C++ occupies a weird place in the abstraction-vs-performance tradeoff.

For high abstraction and friendliness, C++ doesn't do nearly as nice of a job as MATLAB or Python in exposing high-level scientific computing operations.

For performance, C++ does have some really nice high-performance libraries (Eigen comes to mind), but for the majority of use cases, you either need to write inline assembly to get the best possible speed, or you're already reusing a fast numerical kernel from another library, and you may be better off in a higher-level language.

This is a very subjective discussion, and there are many interesting, fast, C++ libraries that do important work for science. But your claim that C++ is the best language for such work would be highly contested by many high performance computing specialists. As it is, I believe there's a pretty rough split between C, C++, and Fortran supporters, with Python continuing to gain traction.

Here's another counter-argument. MPI is used in 99% of all scientific high performance computing codes. The latest version of MPI, MPI-3, drops explicit support for C++ bindings. If C++ was so dominant, why would explicit bindings be dropped?

link

srean 4432 days ago

>For high abstraction and friendliness, C++ doesn't do nearly as nice of a job as MATLAB or Python in exposing high-level scientific computing operations.

I think you responded to it yourself in the next paragraph.

While not quite the same as using MATLAB or Numpy; with Eigen, Blitz++ and Armadillo you can come really close syntactically. You do have to suffer through the compilation process but it comes with computational benefits. Numpy and its ilk are adversely affected by the limitation of their vectorization paradigm, some more, some less. This style creates needless copies, unnecessary and overly pessimistic loops. These cost performance. Eigen/Blitz++/Armadillo style libraries do not suffer from this problem. A common Pythonic way to recover from these deficiencies of Numpy is to use Cython (and numexpr although it has a very narrow scope). However, to see speedups in these tight numerical loops with Cython one really has to do manual indexing, write at a low level, not so for Eigen/Armadillo/Blitz++.

This, although very limited in scope, is a concrete example that shows that you can write at a higher level in C++ without incurring performance hits.

link

adwn 4432 days ago

I think we're approaching this from two different directions: you from the library user's, and I from the library writer's perspective.

Sure, if there's a good library available that handles all the bottleneck computation, then there's no need to write anything in C++ – just use Python or whatever. But if you actually have to implement a kernel, you can't do it in Python or Matlab.

Regarding assembly: I still believe that assembly is unsuitable except for the simplest of algorithms. Sure, you can optimize the crap out of DGEMM or DAXPY, with SIMD, cache-optimization, and prefetching – but in the end, it's still a very simple algorithm, with a very simple data layout and predictable access patterns. But as soon as it gets a little more complex (e.g., graph algorithms, or a SAT solver), you can forget about assembly. Heck, I doubt you could even get a measurable performance benefit.

> But your claim that C++ is the best language for such work would be highly contested by many high performance computing specialists.

My personal, subjective, and completely unscientific opinion on this is that these people are domain experts, which know how to design fast algorithms and data structures, but not necessarily how to make the best use of a complex language like C++. Or they're extending legacy code. In short: for non-technical reasons (analogy: Haskell and Scala are better languages than Java, and yet a lot more Java code is written).

> The latest version of MPI, MPI-3, drops explicit support for C++ bindings. If C++ was so dominant, why would explicit bindings be dropped?

Maybe because they didn't offer much over the C bindings, and even some C++ projects (e.g. Boost) used them?

link

srean 4432 days ago

I am sorry to say this, but I sure will be happier if this oft repeated vapid trope stops getting repeated. There is more to performance than memory layout and caching.

There is a big difference between optimization in the small and optimization in the large. It is precisely because you need maximum performance and correctness that you need high-level languages. Quite ironically your comment itself illustrates the point well. One could have written high-perf apps entirely in the style of OpenBlas. Sensible people dont and for a reason.

I am going to copy a previous comment of mine on this topic.

I will not be surprised if it beats C by generating C code itself, but C code that no human would write by hand. It has been done before in programming history (not just C, fortran has been beaten as well http://dl.acm.org/citation.cfm?id=200989 http://link.springer.com/chapter/10.1007/978-3-642-19014-8_1.... http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.150.... although the prior art focuses on Lisp I think Haskell and ML like languages are better equipped for this). Imperative code is a lot harder to reason about, and low level code is a lot harder to write if one cares about correctness and performance simultaneously. Specialized functional languages has a history of beating C implementations. It could be faster still if one removes the constraint that one has to generate C code, for example coroutines and proper tail calls become easier to implement. The bottomline is that correctness is easier to achieve with functional, and performance with imperative. So we would need both and they meet in a compiler.

Take the example of Stalin and StalinGrad, they are Lisps. Their compile times are epic, but once done they have a history of generating faster code than C for specific applications (a specific example where it will excel is multidimensional numeric integration where the function to be integrated is passed as an input). The main reason why they would be able to generate faster code is that they are smarter about inlining. The reason they are smarter about inlining is that programmers intent is better preserved when encoded in a higher level language.

Can a programmer not write the same correct code in C by hand ? Perhaps they can, but it will take longer and would be more error prone and likely to be costlier to produce, unless one takes the help of these higher level abstractions primitives, but then you aren't really writing in C anymore. Can the C compiler not be equally smart ? possibly, but it will be a lot harder to make a C compiler smart than a functional language compiler smart. Functional languages leave a lot of room for the compiler to do its thing, C while not as bad as Java still over-specifies how exactly something needs to be computed (my way or the highway). Add the fact that when code is written in a higher level, but optimization friendly language one can enjoy the benefits of compiler techniques yet to come. It future-proofs the code to an extent and amortizes the effort that went into writing the compiler.

link