| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by z4y5f3 744 days ago
	My experience is that < 500M models are pretty useful when fine-tuned on traditional NLP tasks, such as text classification and sentence/token level labeling. A modern LM with a 32K context window size could be a nice replacement for BERT, RoBERTa, BART.