Hacker News new | ask | show | jobs
by chvid 456 days ago
If they have a cluster with 2,000 H800 GPUs (which is what they have stated in public) training would take 2,800,000 / (2,000 * 24 * 30) ~ 2 months.

A cluster of 2,000 GPUs is what a second tier AI lab has access to. And it shows that you can play in the state of the art LLM-game with some capital and a lot of brains.

2 comments

Isn't the price of an H800 like $30k?

I don't know what your household budget is, but $60M might not be what most people associate with "some capital".

It is a lot less than what Google, OpenAI etc have.

And the GPUs would be a shared resource so what you should calculate is what it would have cost to rent them - probably something like 2 m.

Yesterday GPT asked me if I'd like to train a small LLM and I laughed out loud.

That being said I'm amazed how far 1B models have come. I remember when TinyLlama came out a few years ago, it was not great. ($40K training cost iirc.)

That was a 1B model, but these days even 0.5B models are remarkably coherent.

An H100 has 14592 CUDA cores. 2000 * 14592 already gives you more than 2 million cores.