| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by newhaus1994 2498 days ago
	I'm the lead researcher on the Middlebury Institute project looking at fine-tuning the bigger models, and I got OOM on 745M and 1.5B originally. I had to get an Azure instance with 24GB VRAM to handle it (using nshepperd's codebase). It works, but takes a while (~500 epochs takes 12 hours on a 100k word training dataset).

1 comments

Ouch! So 11GB is nowhere close to being enough, then. I wonder if even switching to FP16 will be adequate?

Might be able to get 745M down to work on a single GPU. I'm definitely not using all 24GB, so fp16 might be able to get it down enough.

How would you use fp16 to get it to work on a single GPU? And if you did, what GPU should you use?