Hacker News new | ask | show | jobs
by Palmik 30 days ago
There are several things at play:

Inference stack efficiency: Many of these providers take off the shelf sglang / vllm / trtllm and hope for the best. Meanwhile DeepSeek team is known for pushing the boundary of optimizations.

Now, sglang and vllm are great pieces of software, but take DeepSeek's Sparse Attention (DSA). Introduced 1.5 years ago (https://arxiv.org/abs/2512.02556), used by DeepSeek 3.2, GLM 5, DeepSeek V4. Only now is it slowly strating to get optimized in the major inference engines: (https://github.com/sgl-project/sglang/issues/19380 https://github.com/sgl-project/sglang/pull/22851 etc.). Of course, DS V4 adds extra optimizations into the model architecture on top of DSA, and those will take more time to be taken full advantage of by the open source inference engines.

Privacy: Betting that people will pay extra for inference hosted outside China. This is especially true with DeepSeek, because DeepSeek is transparent about using API data for model improvements.

And few other things (scale (matters a lot for MoEs), reliability, soft enterprise lock in, etc.)

---

There is also, likely, tacit collusion at play here. Look at GLM 5 and GLM 5.1 prices. GLM 5 and 5.1 cost the same to run, but providers decided to charge much more for 5.1 because it is much better model, and because Z.AI raised their price as well.

1 comments

Another factor is that DeepSeek is not just doing inference, but also training models, so they can use underutilized compute nodes for training during off-peak hours, as described in their DeepSeek v3 article: https://github.com/deepseek-ai/open-infra-index/blob/main/20...

But I agree that the main driver is that they are really good at optimizing. They will have chosen their architecture in such a way that it will be as efficient as possible on their own infrastructure, so they have a massive head start. Inference framework developers still have to catch up.