Y
Hacker News
new
|
ask
|
show
|
jobs
by
sabareesh
541 days ago
Here is some additional journey apart from the rig.
https://sabareesh.com/posts/llm-intro/
1 comments
layer8
541 days ago
How long does training a 1B or 500M model take approximately on the 4-GPU setup? Or does that dramatically depend on the training data? I didn’t see that info on your pages.
link
sabareesh
541 days ago
Roughly it takes 7 days to train on 100B tokens on 500M model
link