Hacker News new | ask | show | jobs
by killingtime74 42 days ago
You don't need a machine. You need a rack of them. 1.34TB VRAM https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...
2 comments

Nobody is serving models in BF16 precision, not even commercial providers. Especially with newer quant methods (like nv4)

The article states you can fit Q4 in 4 x 4090 and it works reasonably well.

I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.

less if you quantize. apparently Q8 and Q4 do pretty well.