Hacker News new | ask | show | jobs
by TheCoreh 902 days ago
From the GitHub repo Readme:

> we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs

I knew the computational power required to train LLMs was absurd, but seeing the figures of larger networks (which are just too large to intuitively understand) it didn't really register. With this one I could actually imagine the 16 machines with A100 GPUs sitting on a server room running at full blast for 90 days so it was more tangible... And now to think about the larger ones is kinda scary

Edit: Did the math and just the GPUs (at 250W each) consumed around 8.64 MWh, which is at the same ballpark of the power consumption of the average US home in one year (10.5MWh)

1 comments

So, four A100-years. Unit cost $8,000 (from a quick search) and electricity cost under $2,000. If you reckon the useful life of an A100 to be four years then that’s a training cost approaching $10,000. I have no idea of the forecast useful life of the GPU, but I’d hope it’d be a lot longer; if it was about ten years, then this training cost would be around $5,000.

Of course, we’re probably both simplifying thing too much, but if these numbers are good enough it’s an interesting perspective.

At these sorts of costs and a final size of 2.2GB, each MB cost a few dollars to produce.