| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by formalsystem 667 days ago
	Most of our performance relies on leveraging torch.compile which generates Triton kernels which run fast on CPU and GPU but not MPS since Triton does not support generating Metal kernels. So you lose the nice story of writing low bit code in pure PyTorch but also get it running fast. In these cases the only path forward we have is writing custom Metal kernels and plugging those in. That work is still ongoing and we'll hopefully have more to share soon.

1 comments

underanalyzer 667 days ago

This might not be the right place for this question but, as someone who has made a couple very modest mps backend contributions, I'm curious why not add metal support to triton (or a fork if openai won't allow it) rather than maintain a whole separate backend?

link

formalsystem 667 days ago

Mostly comes down to what's fastest to develop, it's faster to write a few custom kernels than it is to develop a new compiler backend

Granted after more upfront effort compilers are just such a significant UX boost that indeed you are making me question why I don't spend more time working on this myself lol

link