| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Lucasoato 33 days ago
	Do you know what kind of machine do I need to run the original DeepSeek v4 pro model with a good tok/s throughput?

3 comments

killingtime74 33 days ago

You don't need a machine. You need a rack of them. 1.34TB VRAM https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...

link

fgonzag 32 days ago

Nobody is serving models in BF16 precision, not even commercial providers. Especially with newer quant methods (like nv4)

The article states you can fit Q4 in 4 x 4090 and it works reasonably well.

I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.

link

sterlind 33 days ago

less if you quantize. apparently Q8 and Q4 do pretty well.

link

zamalek 33 days ago

It's not really plausible to host at home, unless you have deep pockets. What you/we win here is a model that doesn't suddenly become worse like the proprietary ones have been doing, and you can choose a provider from a competitive market.

link

karmakaze 33 days ago

DeepSeek v4 pro is still rather large, DeepSeek-V4-Flash[0] becomes relatively more reasonable with smaller quantizations and eventually will be able to effectively offload 'facts' to system RAM. See DwarfStar 4[1] for current sweet spots.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

[1] https://news.ycombinator.com/item?id=48142108

link