Is there one model that you use more frequently than others as a base for these disparate fine tuning tasks? Basically, are there any that are particularly flexible?
In general, BERT would be the most common one. RoBERTa is the same model but trained for longer, which turns out to work better. T5 is a larger model, which works better on many tasks but is more expensive.
So far, of the models that run on GPUs with 8-16GiB VRAM XLM-RoBERTa has been the best for these specific tasks. It worked better than the multi-lingual BERT model and language-specific BERT models by quite a wide margin.
Great, thanks very much for the pointer, especially the VRAM context - I'm looking to fine-tune on 2080Ti's rather than V100/A100s, so that's really good to know.