Hacker News new | ask | show | jobs
by sailingparrot 503 days ago
Parent is (I assume) talking about the entire budget to get to DeepSpeek V3, not the cost of the final training run.

This includes salary for ~130 ML people + rest of the staff, company is 2 years old. They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.

The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.

You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.

I don't know if it really is $1B, but it certainly isn't below $100M.

[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+