| HN Mirror

Nobody is serving models in BF16 precision, not even commercial providers. Especially with newer quant methods (like nv4)

The article states you can fit Q4 in 4 x 4090 and it works reasonably well.

I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.