Hacker News new | ask | show | jobs
by sahil_chaudhary 1188 days ago
All included it costs under 70$ for the 13B model. Training 65B now so will report what that will cost.
2 comments

For the 65B fine tune, did you add another A100 node? Or just drop batch size?

Any chance you’re up to sharing the training parameters?

Dropping the batch size
Please do! Also please include how you’re calculating the costs.