But people use numpy for matrix multiplies in Python. Unless they are claiming to be 35k times faster on general-purpose code, the 35k number is absurd.
A lot of ugly,unreadable code has come into existence because of the need to twist it into NumPy calls. If you can replace these with good old for loops and achieve similar performance, then you've already won. Besides that, there are a lot of code that involves looping that isn't matrix multiplication or covered by NumPy.
Right; but the point is that the optimizations didn't require an entirely new language; you just take the core logic and write it in an existing language that has decades of optimizations. If you're doing math; there's likely a natural, well defined interface that can be used, so you just call that interface from Python, which has historically always been the point of 'glue' languages :)