|
|
|
|
|
by TheCoreh
902 days ago
|
|
From the GitHub repo Readme: > we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs I knew the computational power required to train LLMs was absurd, but seeing the figures of larger networks (which are just too large to intuitively understand) it didn't really register. With this one I could actually imagine the 16 machines with A100 GPUs sitting on a server room running at full blast for 90 days so it was more tangible... And now to think about the larger ones is kinda scary Edit: Did the math and just the GPUs (at 250W each) consumed around 8.64 MWh, which is at the same ballpark of the power consumption of the average US home in one year (10.5MWh) |
|
Of course, we’re probably both simplifying thing too much, but if these numbers are good enough it’s an interesting perspective.
At these sorts of costs and a final size of 2.2GB, each MB cost a few dollars to produce.