Hacker News new | ask | show | jobs
by zhihaojia 362 days ago
Thanks for the great feedback! Stanford's MegaKernel project tackles a similar challenge but focuses on manual CUDA implementation. While MPK takes a compiler-driven approach—users express their LLMs at the PyTorch level, and MPK automatically compiles them into optimized megakernels. Our goal is to make programming megakernels much more accessible.

I completely agree that CUDA can be a limiting factor, especially for latency-sensitive workloads. As GPUs are becoming larger and faster, it's increasingly difficult to write standalone kernels that fully utilize hardware resources—particularly when optimizing for low latency with small batch sizes.

> What are the chances we see your work land in PyTorch as an experimental backend?

We're definitely excited about that direction. We believe MPK can help PyTorch support megakernel generation, and we’re actively exploring how to make that happen. Stay tuned!

> P.S. minor typo, your first two paragraphs under part 1 are nearly identical.

Thanks for pointing it out--I meant to remove the duplicate paragraph when finalizing the post.

1 comments

Hi Author - thank you very much for the clear and relatively easy-to-understand MPK overview. Could you please also comment on the similarity of your project to Hidet https://pytorch.org/blog/introducing-hidet/

Thank you !