| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alex_sf 1188 days ago
	For the 65B fine tune, did you add another A100 node? Or just drop batch size? Any chance you’re up to sharing the training parameters?

1 comments

Dropping the batch size