| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by skummetmaelk 456 days ago
	The fact that you can unironically put the "only" modifier on a training time of 2.8 million GPU hours is nuts.

2 comments

chvid 456 days ago

If they have a cluster with 2,000 H800 GPUs (which is what they have stated in public) training would take 2,800,000 / (2,000 * 24 * 30) ~ 2 months.

A cluster of 2,000 GPUs is what a second tier AI lab has access to. And it shows that you can play in the state of the art LLM-game with some capital and a lot of brains.

link

ArtTimeInvestor 456 days ago

Isn't the price of an H800 like $30k?

I don't know what your household budget is, but $60M might not be what most people associate with "some capital".

link

chvid 456 days ago

It is a lot less than what Google, OpenAI etc have.

And the GPUs would be a shared resource so what you should calculate is what it would have cost to rent them - probably something like 2 m.

link

andai 456 days ago

Yesterday GPT asked me if I'd like to train a small LLM and I laughed out loud.

That being said I'm amazed how far 1B models have come. I remember when TinyLlama came out a few years ago, it was not great. ($40K training cost iirc.)

That was a 1B model, but these days even 0.5B models are remarkably coherent.

link

charcircuit 456 days ago

An H100 has 14592 CUDA cores. 2000 * 14592 already gives you more than 2 million cores.

link

andai 456 days ago

Can someone put this into perspective? I'm finding heterogenous data on other models, i.e. number of tokens, number of GPUs used, cost, etc. It's hard to compare it all.

link