Hacker News new | ask | show | jobs
by lhl 483 days ago
It's great to see vLLM getting faster/better for DeepSeek. I tested vLLM vs SGLang a couple weeks ago and SGLang's DeepSeek support was much better/faster (on 2 x p5 H100 nodes). It's great that no one's standing still, I saw this recent AMD article that reported SGLang perf on MI300X has increased by 4X over the past couple weeks: https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR...

(w/ the extra memory V3/R1 fits on a single MI300X or H200 node)

It'll be interesting to see if either project can take advantage/get any benefits from this FlashMLA implementation.