|
|
|
|
|
by aftbit
502 days ago
|
|
The math they gave was 2,788,000 H800 GPU hours[1], with a rental price of $2/GPU-hour[1], which works out to $5.6M. If they did that on a cluster of 2048 H800s, then they could re-train the model every ~1400 hours (~2 months). If they paid $70,000 per GPU[2] plus $5000 per 4-GPU compute node (random guess), then the hardware would have cost about $150M to build. If you add in network hardware and other data-centery-things, I could see it reaching into the $200M range. IMO $400M might be a bit of a stretch but not too wildly off base. To reach parity with the rental price, they would have needed to re-train 70 times (i.e. over 12 years). They obviously did not do that, so I agree it's a bit unfair to cost this based on $2M in GPU rentals. Why did they buy instead of rent? Probably because it's not actually that cheap to get 2048 concurrent high-performance connected GPUs for 60 days. Or maybe just because they had cash for capex. 1: https://stratechery.com/2025/deepseek-faq/ 2: https://www.tomshardware.com/news/price-of-nvidia-compute-gp... |
|