Y
Hacker News
new
|
ask
|
show
|
jobs
by
alex_sf
1188 days ago
For the 65B fine tune, did you add another A100 node? Or just drop batch size?
Any chance you’re up to sharing the training parameters?
1 comments
sahil_chaudhary
1188 days ago
Dropping the batch size
link