| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fibonacci112358 46 days ago

Sadly for them, Nvidia didn't stay still in the meantime and created the next generation of CUDA, CuTile for Python and soon for C++, through CUDA Tile IR (using a similar compiler stack based on MLIR).

Event though it's not portable, it will likely have far greater usage than Mojo just by being heavely promoted by Nvidia, integrated in dev tools and working alongside existing CUDA code.

Tile IR was more likely a response to the threat of Triton rather than Mojo, at least from the pov of how easy is to write a decently performing LLM kernel.

4 comments

pjmlp 46 days ago

And for not staying behind, Intel and AMD are doing similar efforts, and then we have the whole CPython JIT finally happening after so many attempts.

Not to mention efforts like GraalPy and PyPy.

And all these efforts work today in Windows, which is quite relevant in companies where that is the assigned device to most employees, even if the servers run Linux distros.

I keep wondering if this isn't going to be another Swift for Tensorflow kind of outcome.

link

IshKebab 45 days ago

The CPython JIT has barely had any impact on its performance. CPython is always going to be dog slow.

link

pjmlp 45 days ago

Of course, it is still on baby steps and has to be explicitly enabled when installing the right build.

It only has to be good enough, to keep the ecosystem going, and the porting cost not be worthwhile, when Mojo finally reaches parity.

link

melodyogonna 46 days ago

People keep mistaking Mojo as good syntax for writing GPU code, and so imagine Nvidia's Python frameworks already do that. But... would CuTile work on AMD GPUs and Apple Silicon? Whatever Nvidia does will still have vendor lock-in.

link

pjmlp 46 days ago

Indeed, but Intel and AMD are also upping their Python JIT game, and in the end Mojo code isn't portable anyway.

You always need to touch the hardware/platform APIs at some level, because even if the same code executes the same, the observed performance, or in the case of GPUs the numeric accuracy has visible side effects.

link

melodyogonna 46 days ago

It is portable in that you can write code to target multiple platforms in the same codebase. Mojo has powerful compile-time metaprogramming that allows you to tell the compiler how to specialise using a compile-time conditional, e.g. https://github.com/modular/modular/blob/9b9fc007378f16148cfa...

Of course, this won't be necessary in most cases if you're building on top of abstractions provided by Modular.

You don't get this choice using vendor-specific libraries; you're locked into this or that.

link

pjmlp 46 days ago

Yes you do, you get PyTorch or whatever else, built on top of those vendor-specific libraries.

That is the thing with Mojo, when it arrives as 1.0, the LLM progress and the investment that is being done in GPU JITs for Python, make it largely irrelevant for large scale adoption.

Sure some customers might stay around, and keep Modular going, the gold question is how many.

link

melodyogonna 46 days ago

Pytorch is built on an amalgamation of these different frameworks, not on one of them used to target different vendors.

link

pjmlp 45 days ago

The point still stands as middleware.

link

Conscat 45 days ago

My understanding from speaking with a few Tile IR devs on dates is that its primary motivation was providing better portability for programming tensor cores than PTX offers. Nobody ever told me they saw it as a response to anything other than customer feedback.

link

brcmthrowaway 46 days ago

Interesting, how big impact is CuTile?

link