Nobody is serving models in BF16 precision, not even commercial providers. Especially with newer quant methods (like nv4)
The article states you can fit Q4 in 4 x 4090 and it works reasonably well.
I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.
It's not really plausible to host at home, unless you have deep pockets. What you/we win here is a model that doesn't suddenly become worse like the proprietary ones have been doing, and you can choose a provider from a competitive market.
DeepSeek v4 pro is still rather large, DeepSeek-V4-Flash[0] becomes relatively more reasonable with smaller quantizations and eventually will be able to effectively offload 'facts' to system RAM. See DwarfStar 4[1] for current sweet spots.