Hacker News new | ask | show | jobs
by pcordes 3228 days ago
Actual high-performance-computing clusters are typically used efficiently, running well-tuned code. BLAS libraries (matrix multiplication) are usually very heavily optimized.

Of course, a lot of code just uses these optimized building blocks and ends up doing multiple passes over the data instead of doing more with each pass while it's still hot in cache. It's disappointing that we still don't have optimizing compilers that know how to produce code for an efficient matrix multiply or something, and be able to mix in some other work right into that.

Your point definitely applies to server farms, though, running a huge stack of interpreted languages and big clunky "building blocks" instead of something written efficiently.