|
|
|
|
|
by lhl
483 days ago
|
|
It's great to see vLLM getting faster/better for DeepSeek. I tested vLLM vs SGLang a couple weeks ago and SGLang's DeepSeek support was much better/faster (on 2 x p5 H100 nodes). It's great that no one's standing still, I saw this recent AMD article that reported SGLang perf on MI300X has increased by 4X over the past couple weeks: https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR... (w/ the extra memory V3/R1 fits on a single MI300X or H200 node) It'll be interesting to see if either project can take advantage/get any benefits from this FlashMLA implementation. |
|