Hacker News new | ask | show | jobs
by frakt0x90 1032 days ago
At least they included numpy in this one. On their last post, after all their optimizations, numpy.matmul() produced almost the exact same throughput as their most optimized example. Would still need to dig in to see if this one has issues. Benchmarks are always such a minefield.
1 comments

matmul is a wrapper for BLAS. If you're faster than BLAS you're beating handwritten assembler code specialized per CPU architecture.
But people use numpy for matrix multiplies in Python. Unless they are claiming to be 35k times faster on general-purpose code, the 35k number is absurd.
A lot of ugly,unreadable code has come into existence because of the need to twist it into NumPy calls. If you can replace these with good old for loops and achieve similar performance, then you've already won. Besides that, there are a lot of code that involves looping that isn't matrix multiplication or covered by NumPy.
Right; but the point is that the optimizations didn't require an entirely new language; you just take the core logic and write it in an existing language that has decades of optimizations. If you're doing math; there's likely a natural, well defined interface that can be used, so you just call that interface from Python, which has historically always been the point of 'glue' languages :)