| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by YetAnotherNick 317 days ago
	Deepseek main run costed $6M. qwen3-30b-a3b probably would cost few $100Ks, which is ranked 13th. GPU cost of the final model training isn't the biggest chunk of the cost and you can probably replicate results of models like Llama 3 very cheaply. It's the cost of experiments, researchers, data collection which brings overall cost 1 or 2 order of magnitude higher.

1 comments

ilaksh 317 days ago

What's your source for any of that? I think the $6 million thing was identified as a lie they felt was necessary because of GPU export laws.

link

YetAnotherNick 317 days ago

It wasn't a lie, it was a misrepresentation of the total cost. It's not hard to calculate the cost of the training though. It takes 6 * active parameters * tokens flops[1]. To get number of seconds you can divide by Flops/s * MFU, where MFU is around 45% for H100 for large enough models[2].

[1]: https://arxiv.org/abs/2001.08361

[2]: https://github.com/facebookresearch/lingua

link

CamperBob2 317 days ago

That paper's 5 years old at this point, dating back to when Amodei was still an OpenAI employee. Has any newer work superseded it, or are those assumptions still considered solid?

link

YetAnotherNick 316 days ago

Those assumptions are still the same. Although now context length has increased more so the n^2 part is non negligible. See the repo for correct flop calculation[1]

[1]: https://github.com/facebookresearch/lingua/blob/437d680e5218...

link