|
|
|
|
|
by newhaus1994
2498 days ago
|
|
I'm the lead researcher on the Middlebury Institute project looking at fine-tuning the bigger models, and I got OOM on 745M and 1.5B originally. I had to get an Azure instance with 24GB VRAM to handle it (using nshepperd's codebase). It works, but takes a while (~500 epochs takes 12 hours on a 100k word training dataset). |
|