Hacker News new | ask | show | jobs
by Lucasoato 33 days ago
Do you know what kind of machine do I need to run the original DeepSeek v4 pro model with a good tok/s throughput?
3 comments

You don't need a machine. You need a rack of them. 1.34TB VRAM https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...
Nobody is serving models in BF16 precision, not even commercial providers. Especially with newer quant methods (like nv4)

The article states you can fit Q4 in 4 x 4090 and it works reasonably well.

I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.

less if you quantize. apparently Q8 and Q4 do pretty well.
It's not really plausible to host at home, unless you have deep pockets. What you/we win here is a model that doesn't suddenly become worse like the proprietary ones have been doing, and you can choose a provider from a competitive market.
DeepSeek v4 pro is still rather large, DeepSeek-V4-Flash[0] becomes relatively more reasonable with smaller quantizations and eventually will be able to effectively offload 'facts' to system RAM. See DwarfStar 4[1] for current sweet spots.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

[1] https://news.ycombinator.com/item?id=48142108