| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by buboard 2420 days ago
	google seemed to make a genuine effort to make a model that is useful rather than record-breaking with bert. But i think it's wrong to consider it the "final" model upon which everything else will be built.

1 comments

BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.

What methods make BERT outdated? Do you have pointers to other options?

e.g. XLNet:

XLnet is Bert with a bunch of additional training tricks.

BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...