Y
Hacker News
new
|
ask
|
show
|
jobs
by
ma2rten
2027 days ago
In general, BERT would be the most common one. RoBERTa is the same model but trained for longer, which turns out to work better. T5 is a larger model, which works better on many tasks but is more expensive.
1 comments
ericd
2026 days ago
Thanks for the summary! I'm familiar with BERT, but less so the different variants, so that's quite helpful. I'll take a look at how RoBERTa works.
link