Hacker News new | ask | show | jobs
by sacred_numbers 1180 days ago
If you bought an 8xA100 machine for $140k you would have to run it continuously for over 10,000 hours (about 14 months) to train the 7B model. By that time the value of the A100s you bought would have depreciated substantially; especially because cloud companies will be renting/selling A100s at a discount as they bring H100s online. It might still be worth it, but it's not a home run.
1 comments

If 8-bit training methods take off, I think the calculus is going to change rapidly, with newer cards that have decent amounts of memory and 8-bit acceleration starting to become dramatically more cost and time effective than the venerable A100s.