Hacker News new | ask | show | jobs
by TechDebtDevin 500 days ago
Over the last few days people have asked me if they think NVIDIA is fkd.. It still takes two H100s to run inference on the DS v3 671b @ <200 tokens per second.
1 comments

only 2 ? what kind of h100s do you have?
There are different versions of the model as well as using it with different levels of quantization.

Some variants of DeepSeek-R1 can be run on 2x H100 GPUs, and some people managed to get still quite decent results with a even stronger distilled mode running it on consumer hardware.

For DeepSeek-V3 even with 4bit quantization you need more like 16x H100.

I meant quantized versions but yea, I get your point.