Hacker News new | ask | show | jobs
by sabareesh 541 days ago
Here is some additional journey apart from the rig. https://sabareesh.com/posts/llm-intro/
1 comments

How long does training a 1B or 500M model take approximately on the 4-GPU setup? Or does that dramatically depend on the training data? I didn’t see that info on your pages.
Roughly it takes 7 days to train on 100B tokens on 500M model