| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by TechDebtDevin 547 days ago
	Over the last few days people have asked me if they think NVIDIA is fkd.. It still takes two H100s to run inference on the DS v3 671b @ <200 tokens per second.

1 comments

htrp 547 days ago

only 2 ? what kind of h100s do you have?

link

dathinab 547 days ago

There are different versions of the model as well as using it with different levels of quantization.

Some variants of DeepSeek-R1 can be run on 2x H100 GPUs, and some people managed to get still quite decent results with a even stronger distilled mode running it on consumer hardware.

For DeepSeek-V3 even with 4bit quantization you need more like 16x H100.

link

TechDebtDevin 547 days ago

I meant quantized versions but yea, I get your point.

link