Hacker News new | ask | show | jobs
by ma2rten 2027 days ago
In general, BERT would be the most common one. RoBERTa is the same model but trained for longer, which turns out to work better. T5 is a larger model, which works better on many tasks but is more expensive.
1 comments

Thanks for the summary! I'm familiar with BERT, but less so the different variants, so that's quite helpful. I'll take a look at how RoBERTa works.