Hacker News new | ask | show | jobs
by alex_sf 1188 days ago
For the 65B fine tune, did you add another A100 node? Or just drop batch size?

Any chance you’re up to sharing the training parameters?

1 comments

Dropping the batch size