Hacker News new | ask | show | jobs
by chrislattner 82 days ago
If you want the fastest open source implementation on Blackwell and AMD MI355, check out Modular's MAX nightly. You can pip install it super fast, check it out here: https://www.modular.com/blog/day-zero-launch-fastest-perform...

-Chris Lattner (yes, affiliated with Modular :-)

2 comments

Faster than TensorRT-LLM on Blackwell? Or do you not consider TensorRT-LLM open source because some dependencies are closed source?
I reviewed the TensorRT-LLM commit history from the past few days and couldn't find any updates regarding Gemma 4 support. By contrast, here is the reference for MAX:https://github.com/modular/modular/commit/57728b23befed8f3b4...
If OP meant they have the fastest implementation of Gemma 4 on Blackwell at the moment, I guess that is technically true. I doubt that will hold up when TensorRT-LLM finishes their implementation though.
How is the sglang performance on Blackwell for this model?
Dunno but there's a PR for it. Probably also more performant than Modular.
What % of a speedup should I be expecting vs just running this the standard pytorch approach?