Y
Hacker News
new
|
ask
|
show
|
jobs
by
sahil_chaudhary
1188 days ago
All included it costs under 70$ for the 13B model. Training 65B now so will report what that will cost.
2 comments
alex_sf
1188 days ago
For the 65B fine tune, did you add another A100 node? Or just drop batch size?
Any chance you’re up to sharing the training parameters?
link
sahil_chaudhary
1188 days ago
Dropping the batch size
link
sillysaurusx
1188 days ago
Please do! Also please include how you’re calculating the costs.
link
Any chance you’re up to sharing the training parameters?