Hacker News new | ask | show | jobs
by layer8 534 days ago
How long does training a 1B or 500M model take approximately on the 4-GPU setup? Or does that dramatically depend on the training data? I didn’t see that info on your pages.
1 comments

Roughly it takes 7 days to train on 100B tokens on 500M model