Hacker News new | ask | show | jobs
by seewhydee 3 days ago
That wouldn't explain why Deepseek is fast relative to other Chinese providers, especially considering that they're reportedly ahead of the curve among Chinese companies in moving off Nvidia. I think their quant fund background has more to do with it. Their models are clearly designed with performant inference clearly in mind.
1 comments

Yes, it's performant, and esp performant at non-trivial context depths. DeepSeek-V4 DS4 (and Flash - DS4F) drop tok/s speed much less than the rest. On my M2 Max it took context depths of 768K to drop tok/s to ~10 tok/s.

https://x.com/ljupc0/status/2062457314414587996

Other local models I've checked drop to unusable speeds way sooner. Only other model with similarity favourable curve I've tried is nemotron-cascade-2-30b-a3b. But it's a small model, way dumber than DS4F.

Coding agents use cases have large context depths. The rate of decline is as important as the headline number.