Y
Hacker News
new
|
ask
|
show
|
jobs
by
chrislattner
82 days ago
If you want the fastest open source implementation on Blackwell and AMD MI355, check out Modular's MAX nightly. You can pip install it super fast, check it out here:
https://www.modular.com/blog/day-zero-launch-fastest-perform...
-Chris Lattner (yes, affiliated with Modular :-)
2 comments
nabakin
82 days ago
Faster than TensorRT-LLM on Blackwell? Or do you not consider TensorRT-LLM open source because some dependencies are closed source?
link
melodyogonna
82 days ago
I reviewed the TensorRT-LLM commit history from the past few days and couldn't find any updates regarding Gemma 4 support. By contrast, here is the reference for MAX:
https://github.com/modular/modular/commit/57728b23befed8f3b4...
link
nabakin
82 days ago
If OP meant they have the fastest implementation of Gemma 4 on Blackwell at the moment, I guess that is technically true. I doubt that will hold up when TensorRT-LLM finishes their implementation though.
link
pama
82 days ago
How is the sglang performance on Blackwell for this model?
link
nabakin
82 days ago
Dunno but there's a PR for it. Probably also more performant than Modular.
link
jjcm
82 days ago
What % of a speedup should I be expecting vs just running this the standard pytorch approach?
link