|
|
|
|
|
by z4y5f3
744 days ago
|
|
My experience is that < 500M models are pretty useful when fine-tuned on traditional NLP tasks, such as text classification and sentence/token level labeling. A modern LM with a 32K context window size could be a nice replacement for BERT, RoBERTa, BART. |
|