Hacker News new | ask | show | jobs
by segmondy 501 days ago
They pointed out that the cost calculation is based on if those GPUs were rented at $2/hr. They are not factoring in the prior cost of buying those H800s because they didn't buy it to build R1. They are not factoring in the cost to build v2, or v2.5. The cost is to build V3. The cost to build R0 and R1 on top of v3, seems far cheaper and they didn't mention that. They are not factoring in the cost to build out their datacenter or salary. Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."

V3 was released a bit a month ago, V3 is not what took the world by storm but R1. The price everyone is talking about is the price for V3.

3 comments

If this weren't an attempt to sell a false equivalency, at least one story would have details on the equivalent rental cost of compute used to train closed source frontier models from OpenAI, Anthropic, Mistral... Lack of clarity makes it a story.
>>Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

This is still quite impressive, given most people are likely to buy cloud infrastructure from AWS or Azure than build their own datacenter. So the Math checks out.

I don't think compute capacity built already will go waste, likely more and bigger things will get built in the coming years so most of it will be used for that purpose.

thanks for the explanation, these facts are completely overlooked in mass media in favour of catchy headlines