| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by champtar 2350 days ago
	One of the best 2h practical course that I had was just write the fastest square matrix multiplication. You could use any language, any algorithm, just no libraries. The target was a 32 core CPU server (this was ~10 years ago). At 5000x5000 all the Java and Python attempts were running out of memory. In C, We tried some openmp, some optimized algorithm, but in the end the best trick was to flip one of the matrix so that memory could be always prefetched. Out of curiosity another student tried GNU Scientific Library, it turned out to be ~100 times faster. My take away was find the right tool for the job! A fun read on cloud scale vs optimized code is this recent article comparing ClickHouse and ScyllaDB (https://www.altinity.com/blog/2020/1/1/clickhouse-cost-effic...)

4 comments

jonas21 2350 days ago

Yeah, I wouldn't be surprised if the majority of code performing large matrix multiplications these days was written in Python and executed on GPUs by libraries like Tensorflow and PyTorch. With the right abstractions, programmers can be "lazy" and still get great performance.

link

srg0 2350 days ago

Matrix multiplication is usually done by a platform-specific BLAS library (BLAS is an API, there are multiple implementations, e.g. Intel MKL, OpenBLAS, cuBLAS). There are some other linear algebra APIs/libraries, but this is what's used the most.

Most of the numerical code that cares about performance for linear algebra uses this API and links an appropriate implementation.

link

otabdeveloper2 2349 days ago

The 'written in Python' you speak of is actually Fortran under the hood.

link

agumonkey 2350 days ago

reminds me of the dude who managed to parse TB with awk instead of whatever spark like product was trendy

link

goatinaboat 2349 days ago

Probably you’re thinking of https://adamdrake.com/command-line-tools-can-be-235x-faster-...

link

YetAnotherNick 2349 days ago

Leaving aside the optimisations, I assume you are doing N^3 multiplication, whereas Strassen algorithm with complexity N^2.81 or even Coppersmith–Winograd algorithm with complexity N^2.37 with larger constant is better with 5000x5000 square matrix.

link

lern_too_spel 2349 days ago

A double array takes O(1) more space in Java vs. C, so running out of memory wouldn't be a problem.

link

saagarjha 2348 days ago

Unfortunately big-O algorithmic analysis doesn't translate cleanly to the real world due to constants and other inefficiencies.

link

lern_too_spel 2348 days ago

It does in this case. Champtar says it's something about the size of the matrix that makes it run out of space. The overhead of the double array is already paid for.

link