Hacker News new | ask | show | jobs
by Kelteseth 1032 days ago
So TL;DR: Using SIMD and multithreading is faster than doing no optimization in python. The only real comparison here is when not doing any optimization is:

> The above code produced a 90x speedup over Python and a 15x speedup over NumPy as shown in the figure below:

Am I missing something?

3 comments

Getting >10x speed up isn’t exciting enough for many people?

I’ll take it.

This is all pretty impressive if I can take my unmodified (slightly modified?) Python code and get that sort of improvement.

> This is all pretty impressive if I can take my unmodified (slightly modified?) Python code and get that sort of improvement.

it'll never work as smoothly as they advertise. just hands down, beyond a shadow of a doubt, their claims about supporting "unmodified" Python code are startup hype. how do i know? i could give you a bunch of technical reasons about Python as a language and CPython as the de facto implementation (thereby informing tons of code already written, re extensions) but there's a much simpler way to reason about it: because there are already >10 attempts at this and no one has been able to do it. there's no magic here that any number of dollars or brains could pull off. instead each such project picks a point on the pythonic<->performant design-space tradeoff curve and then asks/expects you to live with that choice.

and taking ^ into consideration, mojo is not that special. only thing going for it is chris lattner isn't bad at designing languages so maybe, on its own, it'll be a nice language (but it needs to be open to get any traction on its own).

It's not 10x but GraalPy can speed up unmodified Python by 3.4x on average:

https://www.graalvm.org/python/

And they've not been going at it that long. A few years at most.

graalpy does not fully support C extensions and will have just as hard a time extending support as anyone else. maybe even the hardest because they're plumbing through the JVM which, notoriously, has bad C FFI (at least until recently?).
It's incomplete but it does support C extensions and can run code with NumPy and other science modules.

Their approach is unique which is why it can work (they proved out the idea with ruby already). They compile the modules with LLVM and then extend the Python interpreter/JIT compiler with support for LLVM bitcode. So the JITC compiles both Python and C extensions together as one unit. The interpreter API is then virtualized so that code that looks like a structure read or method call from C is compiled directly down to the optimized machine code being used by the rest of the JITC. In this way the interop overhead can be optimized out.

This is all separate tech that goes well beyond a normal FFI. JNI doesn't even get involved at all.

from reading, some of this isn't quite correct (it's graalvm that supports bitcode), but i have to say i didn't realize that that's what they were doing (compiling python and llvm bitcode to graalvm both). interesting but okay you have to admit that's a fairly "beyond scope" approach - ie they solve the problem of C extensions not by compiling python to C but by compiling both to JVM. anyway thanks.
> i could give you a bunch of technical reasons about Python as a language and CPython as the de facto implementation

Please do. I'm very interested.

> no optimization in python

Well, isn't that most Python? If Mojo can pave over the slow interpreted bits I repeatedly dig up in Python profilers, even well maintained projects, with no code changes, that would be huge.

So does this mean Swift and Metal offers the same if not better performance enhancements? SIMD is very much a first class citizen as a type there
No, Lattner learned from Swift and is avoiding anything except zero-cost abstractions.

Also, Swift isn’t very interesting outside the Apple ecosystem, and Metal doesn’t exist outside the Apple ecosystem. Mojo has a real shot at widespread, general-purpose, language adoption!