I'm really interested in Mojo not for its AI applications, but as an alternative to Julia for high performance computing. Like Julia, Mojo is also attempting to solve the two-language problem, but I like that Mojo is coming at it from a Python perspective rather than trying to create new syntax. For better or for worse, Python is absolutely dominating in the field of scientific computing, and I don't see that changing anytime soon. Being able to write optimizations at a lower level in a Python-like syntax is really appealing to me.
Furthermore, while I love Julia the language, I'm disappointed in how it really hasn't taken off in adoption by either academia or industry. The community is small and that becomes a real pain point when it comes to tooling. Using the debugger is an awful experience and the VSCode extension that is recommended way to write Julia is very hit-or-miss. I think it would really benefit from a lot more funding that doesn't actually seem to be coming. It's not a 1-to-1 comparison, but Modular has received 3 times the amount of funding as JuliaHub despite being much younger.
I was responsible for the S4TF effort at Google. In my opinion, it validated that some of the ideas are good (e.g. Graph Program Extraction is the algorithm that torch dynamo uses internally), that an efficient compiled language has benefits etc. However, I also learned that it should not be based on Swift and should not be based on TensorFlow. Other than those two things, everything is great ;-)
I’m a huge Julia fan, you can take a look at my posting history. I love Julia’s syntax, and some of its language ideas.
…BUT…
For my personal tastes, Mojo’s lack of garbage collection, Rust-like memory safety, and attention to ahead-of-time compilation put it way ahead. The vast pool of Python developers who can easily pick it up if interested is a big plus.
Julia is aimed at a somewhat different space, but there’s also a huge overlap.
Let’s hope for good interoperability between the two, it seems fairly straightforward…
Lets see how it plays out, given that they are focused only on AI workloads, and somehow those VCs want their money back, which doesn't appeal to everyone.
I acknowledge that there is finally pressure in the Python community to tackle down performance, but don't see Mojo being the solution unless there is something that it will make it go wild.
Right now, I see that more likely with Facebook, NVidia, Intel and Microsoft efforts.
At least they included numpy in this one. On their last post, after all their optimizations, numpy.matmul() produced almost the exact same throughput as their most optimized example. Would still need to dig in to see if this one has issues. Benchmarks are always such a minefield.
But people use numpy for matrix multiplies in Python. Unless they are claiming to be 35k times faster on general-purpose code, the 35k number is absurd.
A lot of ugly,unreadable code has come into existence because of the need to twist it into NumPy calls. If you can replace these with good old for loops and achieve similar performance, then you've already won. Besides that, there are a lot of code that involves looping that isn't matrix multiplication or covered by NumPy.
Right; but the point is that the optimizations didn't require an entirely new language; you just take the core logic and write it in an existing language that has decades of optimizations. If you're doing math; there's likely a natural, well defined interface that can be used, so you just call that interface from Python, which has historically always been the point of 'glue' languages :)
I'm pretty excited about Mojo and have been keeping an eye on it's development. I feel like the team has learned a lot from their experience, and are taking the best from languages like Python, Rust, Swift, Hylo (Formerly known as Val), and are taking a really nice pragmatic approach in implementing them so that the language is approachable, but also very safe and fast. Once it's out, I hope someone sits down and makes a SwiftUI-like cross platform UI library with it ;).
Actually more interested in things like UIs, quick API servers, stuff like that than the AI/ML use cases. The idea of most of the ease and approachability of Python, a proper type system, and access to the entire ecosystem of Python libs in a compiled language is pretty compelling.
35Kx speedup is not scaled speedup. Throw this, naively parallelizable task at a bigger computer and get 70kx speedup, etc.
While i think there are tons of optimizations to be done for python (looking at you GIL) giving access to low level cpu primitives is not one I think that will be broadly adopted by the python community. That's one of the joys of python: system agnostic, looks pretty close to pseudocode, coding. If you want speed, glue together a bunch of compiled code calls, and hope the call overhead isn't too large. Or write cpu intensive operations in numba, or pyrex. At the end of the day, mojo's pay to play programming language harkens back to the early 90's Borland days.
Right. However, this is a comparison versus Python and the GIL, which can’t do that at all.
> While i think there are tons of optimizations to be done for python (looking at you GIL) giving access to low level cpu primitives is not one I think that will be broadly adopted by the python community.
It doesn’t need to be, any more than writing Numba or Pyrex is done on a large scale.
> That's one of the joys of python: system agnostic, looks pretty close to pseudocode, coding. If you want speed, glue together a bunch of compiled code calls, and hope the call overhead isn't too large. Or write cpu intensive operations in numba, or pyrex. At the end of the day, mojo's pay to play programming language harkens back to the early 90's Borland days.
The appeal is having a high level language that compiles to efficient machine (and GPU!) code. One can “drop down” to Python for non performance intensive parts.
I think this will be much more of a draw for people coming from C++, Fortran and other older, jankier languages. It looks to hit a sweet spot for real time embedded development VERY well, especially given Rust-like memory safety!
Mojo will also be a worthy competitor to Julia in the HPC scientific arena I think…we’ll see!
Have you played with Mojo? It really doesn’t feel high level.
I feel like JAX has been eating Julia’s lunch lately, making me think that there’s a real market for a small functional differentiable programming language with good Python interop - like a more polished Dex or Futhark.
Particular functions may deal with low-level machine features, that is unavoidable when extracting maximum performance from hardware. Mojo is pursuing some innovative ideas there, such as autotuning and adaptive compilation.
As I said in a different post, I don’t think Mojo’s main audience is the general Python community, it’s the AI, real time, embedded, safety critical, HPC, and yes, gaming, communities that’ll likely benefit the most.
Isn't it more that the plan is someday Mojo may be a proper superset of Python, but right now it is far from it? I just tried opening up the Mojo playground, copy/pasted the very first little example function from the official Python tutorial (see https://docs.python.org/3/tutorial/controlflow.html#defining...) and Mojo outputs a bunch of errors.
With Cython, our goal was to make it a proper superset of Python, and it was really difficult, but we got close.
> Right. However, this is a comparison versus Python and the GIL, which can’t do that at all.
Single process python does not take advantage of a multicore architecture but neither would single process mojo. Embarrassingly parallel operations like mandlebrot can trivially be written with multiprocessing (https://github.com/DipanshuSehjal/Mandelbrot-set/blob/master...), or joblib to run in parallel in otherwise vanilla python. It would be trivial to implement this in jax and run on a gpu or tpu, but i wouldn't say that jax is the reason for the speed up.
> At the end of the day, mojo's pay to play programming language harkens back to the early 90's Borland days.
I didn’t address this in my other post. Modular is about to release a freely available SDK. Also, the standard library sources will be open sourced shortly. There are hints of additional open source initiatives.
Modular’s main business plan appears to be adding value in the general area of AI, AI training, and AI deployment, including by offering SAAS. That plan in no way conflicts with (and in fact encourages) an open Mojo language ecosystem.
that is good to hear. I read a post on Mojo months ago, signed up to the waitlist and then crickets. It would seem insane to think a non-open source, non-free compiler/interpreter could be successful these days.
Mojo needs to demonstrate Hugging Face's AI libraries with Mojo acceleration. Nothing else will have the kind of impact that would have.
Throw a half dozen engineers at it, develop a deployment plan for SD XL, profit.
You'll get a ton of open source developers working on improving the Mojo versions even further once you release it, researchers developing extensions, etc. GO TO WHERE THE DEVELOPERS ARE.
Stable Diffusion is crazy compute heavy, so if Mojo is what it's purported to be, it should be possible to get speedups.
I don't understand the play here for Modular. If this is a worthwhile improvement that is broadly applicable, won't it at some point make it's way into Python, numpy, etc?
In Java land we had a bunch of other JVMs over the years offering better performance. Most important things got absorbed into what is now OpenJDK, and the other JVMs, if they even exist at all, are niche players.
Performance is a huge focus in Python and ML lands right now, so why would this be any different?
Just based on their website, I think selling Mojo as a faster Python-like language isn't intended to be their main product. They place a lot more emphasis on AI/ML acceleration than on Mojo, and on creating compatibility between different AI hardware acceleration systems.
I have the impression they hope vendors of AI acceleration hardware, clusters and cloud services will be their customers, to provide uniform and heavily backward-compatible cross-acclerator AI/ML APIs to those vendors' customers.
And hope that users of those services and hardware will also pay for high quality well-researched APIs that work reliably with many different AI/ML accelerators, even if Mojo is free. Similar to how RedHat provides value through commercial-grade QA and sustained development for Linux on high-end hardware, that would be complicated and risky to use otherwise.
If they've figured out how to deliver performance that Python might get around to in 5-10y, shouldn't they tout that, for people who might want that now?
Ultimately promoting the possibility for better performance, & current contrast, is good for prodding other languages/runtimes like Python to match these options. The "important things [get] absorbed" process you mention relies on teams making some "play for" alternatives, to create the impetus to get new things integrated.
Modular is mainly focused on improving AI related workflows as its business model. That market is easily many $billions, and I think most expect the AI industry to experience explosive growth.
Kotlin is similar to Swift but arguably compiles much faster despite a suboptimal initial architecture, and avoids weird language/compiler specific problems never before seen, like expressions that time out whilst compiling.
Graal is similar to LLVM but can compile a far larger range of languages, is actually used for both JIT and AOT compilation (does anyone use llvm jit in prod?), and has many innovations LLVM never could have even tried to have.
So it's not really clear that he's the best compiler person in the world. More like, the people doing the other stuff aren't in California so don't get the same level of attention.
They are also building in pretty serious Python interop. You should be able to at least somewhat mix the two or migrate gradually, and still use Python libs for less performance critical code (or if the libs do their performance critical stuff in C++ or whatever and are therefore fast enough).
I just want to see real un-hyped benchmarks. Comparing random Python native code makes no sense and seems dishonest, deterring me from actually trying out the tool.
I want a Python that can statically plan underlying GPU allocations, avoids CUDA kernel dispatch overhead and enables a multi-GPU API that isn't some multiprocessing abomination.
As a high-performance computing person, I'm usually I/O bound, not compute bound. I wish someone would come up with a 10x speed up for disk and network I/O.
So TL;DR: Using SIMD and multithreading is faster than doing no optimization in python. The only real comparison here is when not doing any optimization is:
> The above code produced a 90x speedup over Python and a 15x speedup over NumPy as shown in the figure below:
> This is all pretty impressive if I can take my unmodified (slightly modified?) Python code and get that sort of improvement.
it'll never work as smoothly as they advertise. just hands down, beyond a shadow of a doubt, their claims about supporting "unmodified" Python code are startup hype. how do i know? i could give you a bunch of technical reasons about Python as a language and CPython as the de facto implementation (thereby informing tons of code already written, re extensions) but there's a much simpler way to reason about it: because there are already >10 attempts at this and no one has been able to do it. there's no magic here that any number of dollars or brains could pull off. instead each such project picks a point on the pythonic<->performant design-space tradeoff curve and then asks/expects you to live with that choice.
and taking ^ into consideration, mojo is not that special. only thing going for it is chris lattner isn't bad at designing languages so maybe, on its own, it'll be a nice language (but it needs to be open to get any traction on its own).
graalpy does not fully support C extensions and will have just as hard a time extending support as anyone else. maybe even the hardest because they're plumbing through the JVM which, notoriously, has bad C FFI (at least until recently?).
It's incomplete but it does support C extensions and can run code with NumPy and other science modules.
Their approach is unique which is why it can work (they proved out the idea with ruby already). They compile the modules with LLVM and then extend the Python interpreter/JIT compiler with support for LLVM bitcode. So the JITC compiles both Python and C extensions together as one unit. The interpreter API is then virtualized so that code that looks like a structure read or method call from C is compiled directly down to the optimized machine code being used by the rest of the JITC. In this way the interop overhead can be optimized out.
This is all separate tech that goes well beyond a normal FFI. JNI doesn't even get involved at all.
Well, isn't that most Python? If Mojo can pave over the slow interpreted bits I repeatedly dig up in Python profilers, even well maintained projects, with no code changes, that would be huge.
No, Lattner learned from Swift and is avoiding anything except zero-cost abstractions.
Also, Swift isn’t very interesting outside the Apple ecosystem, and Metal doesn’t exist outside the Apple ecosystem. Mojo has a real shot at widespread, general-purpose, language adoption!
I don’t understand this from a goals perspective. What is an “AI compiler” - and why aren’t they comparing benchmarks with technologies more commonly used in AI?
I think I should be impressed, but I feel like I’m missing the point.
I guess the point is that getting the same performance in most other languages requires hundreds of lines of code. Here they are ostensibly achieving that performance using very succinct code. That is pretty nice especially if it integrates well with Python.
Furthermore, while I love Julia the language, I'm disappointed in how it really hasn't taken off in adoption by either academia or industry. The community is small and that becomes a real pain point when it comes to tooling. Using the debugger is an awful experience and the VSCode extension that is recommended way to write Julia is very hit-or-miss. I think it would really benefit from a lot more funding that doesn't actually seem to be coming. It's not a 1-to-1 comparison, but Modular has received 3 times the amount of funding as JuliaHub despite being much younger.