Hacker News new | ask | show | jobs
by bitL 2420 days ago
BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.
1 comments

What methods make BERT outdated? Do you have pointers to other options?
XLnet is Bert with a bunch of additional training tricks.
BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...