Hacker News new | ask | show | jobs
by newhaus1994 2498 days ago
I'm the lead researcher on the Middlebury Institute project looking at fine-tuning the bigger models, and I got OOM on 745M and 1.5B originally. I had to get an Azure instance with 24GB VRAM to handle it (using nshepperd's codebase). It works, but takes a while (~500 epochs takes 12 hours on a 100k word training dataset).
1 comments

Ouch! So 11GB is nowhere close to being enough, then. I wonder if even switching to FP16 will be adequate?
Might be able to get 745M down to work on a single GPU. I'm definitely not using all 24GB, so fp16 might be able to get it down enough.
How would you use fp16 to get it to work on a single GPU? And if you did, what GPU should you use?