Hacker News new | ask | show | jobs
by jxy 846 days ago
How many H100 GPUs does it take to serve 1 Phind-70B model? Are they serving it with bf16, or int8, or lower quants?
1 comments

This video [1] shows someone running at 4-bit quant in 48gb VRAM. I suspect you need 4x that to run at full f16 precision, or approx 3 H100.

https://www.youtube.com/watch?v=dJ69gY0qRbg

Yeah, 4bit would take 35 GB at least. 16bit would be 140 GB. I'm more interested in how Phind is serving it. But I guess that's their trade secret.