|
|
|
|
|
by sillysaurusx
1161 days ago
|
|
Wow. Does that help to double the vocab size? It certainly makes training more expensive. One clever trick to get some memory savings is to freeze the vocab embedding layer when fine tuning. It makes a noticeable improvement, both in speed and in mem required. Surprised they went the larger vocab route. LLaMA is only 30k. I wonder what the reason is... Thanks! |
|