| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chrislattner 82 days ago
	If you want the fastest open source implementation on Blackwell and AMD MI355, check out Modular's MAX nightly. You can pip install it super fast, check it out here: https://www.modular.com/blog/day-zero-launch-fastest-perform... -Chris Lattner (yes, affiliated with Modular :-)

2 comments

nabakin 82 days ago

Faster than TensorRT-LLM on Blackwell? Or do you not consider TensorRT-LLM open source because some dependencies are closed source?

link

melodyogonna 82 days ago

I reviewed the TensorRT-LLM commit history from the past few days and couldn't find any updates regarding Gemma 4 support. By contrast, here is the reference for MAX:https://github.com/modular/modular/commit/57728b23befed8f3b4...

link

nabakin 82 days ago

If OP meant they have the fastest implementation of Gemma 4 on Blackwell at the moment, I guess that is technically true. I doubt that will hold up when TensorRT-LLM finishes their implementation though.

link

pama 82 days ago

How is the sglang performance on Blackwell for this model?

link

nabakin 82 days ago

Dunno but there's a PR for it. Probably also more performant than Modular.

link

jjcm 82 days ago

What % of a speedup should I be expecting vs just running this the standard pytorch approach?

link