| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ericd 2064 days ago
	Is there one model that you use more frequently than others as a base for these disparate fine tuning tasks? Basically, are there any that are particularly flexible?

2 comments

ma2rten 2064 days ago

In general, BERT would be the most common one. RoBERTa is the same model but trained for longer, which turns out to work better. T5 is a larger model, which works better on many tasks but is more expensive.

link

ericd 2063 days ago

Thanks for the summary! I'm familiar with BERT, but less so the different variants, so that's quite helpful. I'll take a look at how RoBERTa works.

link

microtonal 2064 days ago

So far, of the models that run on GPUs with 8-16GiB VRAM XLM-RoBERTa has been the best for these specific tasks. It worked better than the multi-lingual BERT model and language-specific BERT models by quite a wide margin.

link

ericd 2063 days ago

Great, thanks very much for the pointer, especially the VRAM context - I'm looking to fine-tune on 2080Ti's rather than V100/A100s, so that's really good to know.

link