| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by microtonal 2064 days ago
	So far, of the models that run on GPUs with 8-16GiB VRAM XLM-RoBERTa has been the best for these specific tasks. It worked better than the multi-lingual BERT model and language-specific BERT models by quite a wide margin.

1 comments

ericd 2063 days ago

Great, thanks very much for the pointer, especially the VRAM context - I'm looking to fine-tune on 2080Ti's rather than V100/A100s, so that's really good to know.

link