Hacker News new | ask | show | jobs
by melodyogonna 39 days ago
People keep mistaking Mojo as good syntax for writing GPU code, and so imagine Nvidia's Python frameworks already do that. But... would CuTile work on AMD GPUs and Apple Silicon? Whatever Nvidia does will still have vendor lock-in.
1 comments

Indeed, but Intel and AMD are also upping their Python JIT game, and in the end Mojo code isn't portable anyway.

You always need to touch the hardware/platform APIs at some level, because even if the same code executes the same, the observed performance, or in the case of GPUs the numeric accuracy has visible side effects.

It is portable in that you can write code to target multiple platforms in the same codebase. Mojo has powerful compile-time metaprogramming that allows you to tell the compiler how to specialise using a compile-time conditional, e.g. https://github.com/modular/modular/blob/9b9fc007378f16148cfa...

Of course, this won't be necessary in most cases if you're building on top of abstractions provided by Modular.

You don't get this choice using vendor-specific libraries; you're locked into this or that.

Yes you do, you get PyTorch or whatever else, built on top of those vendor-specific libraries.

That is the thing with Mojo, when it arrives as 1.0, the LLM progress and the investment that is being done in GPU JITs for Python, make it largely irrelevant for large scale adoption.

Sure some customers might stay around, and keep Modular going, the gold question is how many.

Pytorch is built on an amalgamation of these different frameworks, not on one of them used to target different vendors.
The point still stands as middleware.
Have you ever wondered how much work would have been saved by the Pytorch team if they could have used just Cuda for all the platforms they support? If they didn't have to write compatibility abstractions or layers, and instead just focused on the problem of training neural networks? What if all the primitives they used from Cuda and cuDNN worked just as well on AMD GPUs, Apple GPUs, and probably Google's TPUs as they did on Nvidia GPUs?

Mojo and Modular's Max platform would do to heterogeneous compute what LLVM did to programming language development. People who dismiss the real value offering here know nothing. Modular have already raised $350m+ from industry giants (including Nvidia and Google) to solve this, and I believe they will.