Y
Hacker News
new
|
ask
|
show
|
jobs
by
buboard
2420 days ago
google seemed to make a genuine effort to make a model that is useful rather than record-breaking with bert. But i think it's wrong to consider it the "final" model upon which everything else will be built.
1 comments
bitL
2420 days ago
BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.
link
turnersr
2420 days ago
What methods make BERT outdated? Do you have pointers to other options?
link
bitL
2420 days ago
e.g. XLNet:
https://arxiv.org/abs/1906.08237
link
phreeza
2420 days ago
XLnet is Bert with a bunch of additional training tricks.
link
bitL
2420 days ago
BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...
link