Hacker News new | ask | show | jobs
by pjmlp 358 days ago
Given that NVidia now decided to get serious with Python JIT DSLs in CUDA as announced at GTC 2025, I wonder how much mindshare Mojo will managed win across researchers.

"1001 Ways to Write CUDA Kernels in Python"

https://www.youtube.com/watch?v=_XW6Yu6VBQE

"The CUDA Python Developer’s Toolbox"

https://www.nvidia.com/en-us/on-demand/session/gtc25-S72448/

"Accelerated Python: The Community and Ecosystem"

https://www.youtube.com/watch?v=6IcvKPfNXUw

"Tensor Core Programming in Python with CUTLASS 4.0"

https://www.linkedin.com/posts/nvidia-ai_python-cutlass-acti...

There is also Julia, as the black swan many outside Python community have moved into, with much more mature tooling, and first tier Windows support, for those researchers that for whatever reason have Windows issued work laptops.

https://info.juliahub.com/industries/case-studies

Mojo as programming language seems interesting as language nerd, but I think the judge is still out there if this is going to be another Swift, or Swift for Tensorflow, in regards to market adoption, given the existing contenders.

5 comments

Mojo (and Modular's whole stack) is pretty much completely focused at people who are interested in inference, not training nor research so much at this moment.

So going after people who need to build low latency high-throughput inference systems.

Also as someone else pointed out, they also target all kinds of hardware, not just NVidia.

Why not use PyO3 instead? It had a much cleaner interface than cython and c++ libraries.

The primary advantage of mojo seems to be Gil-free syntax that is as close to Python as possible.

GPU programming in Rust isn't great.

In Mojo it's pretty much the whole point of the language. If you're only using CPUs, then yeah, PyO3 is a good choice.

What about Candle, made by Huggingface? Seems to at least allow the basics and has lots of examples, all of them run on both CPU and GPU. Haven't dived deeper into it, but played around with it a bit and found it good enough for embedding purposes at least.
I think the big value add of Mojo is that you are no longer writing GPU code that only runs on one particular GPU architecture.

In the same way that LLVM allows CPU code to target more than one CPU architecture, MLIR/Mojo allows GPU code to target multiple vendor's GPUs.

There is some effort required to write the backend for a new GPU architecture, and Lattner has discussed it taking about two months for them to bring up H100 support.

Indeed, but not only GPUs but accelerators in general. Mojo will be able to target weird esoteric hardware (portably if that is important)
Currently looks more like CPUs and eventually AMD, from what I have been following up on their YouTube sessions, and whole blog post series about freedom from NVidia and such.

They also miss CPUs on Windows, unless using WSL.

There's pretty broad support for server-grade and consumer GPUs. It's a bit buried, but one of the most reliable lists of supported GPUs is in the Mojo info documentation.https://docs.modular.com/mojo/stdlib/gpu/host/info/
Already GPU code, kernels, and complete models can run on datacenter AMD GPUs using the same code, the same programming model, and same language constructs.
Laptops?
Yes, recent NVIDIA and AMD consumer GPUs are supported: https://docs.modular.com/max/faq/
not sure, modular is focusing mainly on enterprise applications. but if you look at the current PRs you can see people hacking support for standalone consumer-grade Nvidia and AMD gpus because it is easy, you just add the missing or different intrinsics for the architecture in the lowest level (in pure mojo code) and wire it up in a few places and voila you already program and run code on this GPU. iGPU/Apple GPUs are still not supported yet but it would interesting to see their integration
Mojo is marketed as a way to get maximum hardware performance on any hardware, not just nvidia.

This may appeal to people wanting to run their code on different hardware brands fro various reasons.

True, however that goal is not yet available today, it doesn't even run on Windows natively.

And for those that care, Julia is available today on different hardware brands, as there are other Python DSL JITs as well.

I agree they will get there, now the question is will they get there fast enough to matter, versus what the mainstream market cares about.

Mojo GPU kernels can run on both Nvidia and AMD GPUs today
Julia has GPU compilers for Nvidia, AMD, Intel, and Apple, and we have KernelAbstractions.jl for writing a kernel that is portable between all of them (plus the CPU!)
Specific models if I recall correctly.
Just as LLVM doesn't automatically have a backend or every new CPU architecture, Mojo/MLIR doesn't automatically have a backend for every new CPU/GPU/TPU.

However, writing an LLVM backend for RISC-V sure did add support for a whole lot of different programming languages and the software you have access to through them in one fell swoop.

The same is true here.

Instead of rewiting all your GPU code every time you need to target a new GPU/TPU architecture, you just need a new backend.

Nah, outside of Models you can write Mojo code today that work on both Nvidia and AMD gpus, the code itself doesn't have to be AI specific.
I think they meant models of GPU, not models of LLM.
The limitations of DSLs and the pull of Python make it a practical sweet spot I think if they manage to get the Python compatibility up to par.
I love Julia and want to see it break through. It suffers from the lack of a big corporate patron.
Plenty of them exist already, that is why I pointed out this, that HNers keep overlooking

https://info.juliahub.com/industries/case-studies

Sure, a bunch of companies use Julia but none of them are backing it the way Google backs Go, Oracle backs Java, or Mozilla (formerly) backed Rust.
Didn't hurt that much for Python, Ruby, not having a big name in the early days.

Just like I would consider MIT and a few of the companies on that listing as relevant, doesn't need to always be a FAANG.

Mojo runs on nVidia and AMD.

The competitor is Triton, not CUDA or Julia...

Julia runs on Nvidia, AMD, Intel and Apple GPUs