| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Al-Khwarizmi 2420 days ago
	Large transformer-based models like BERT and its ilk are not only useful to hallucinate text. They have achieved measurable improvements in various (although not all) classic NLP tasks, such as parsing, entailment recognition or question answering. Google has reportedly used BERT to improve their search algorithm, so indeed it's being used to do "actual useful stuff with text". It pains me to say this, as I'm a researcher from an institution without the huge resources of the big tech companies, so I can't compete in the pretrained model arms race (and also, it has made the field more boring, as creative solutions to problems become outperformed by approaches that just pile up more millions of parameters). But it's the truth. Although I think it will only be a stage of things: at some point, performance will plateau and we will need to put our minds to work again, rather than our GPUs.

1 comments

buboard 2420 days ago

google seemed to make a genuine effort to make a model that is useful rather than record-breaking with bert. But i think it's wrong to consider it the "final" model upon which everything else will be built.

link

bitL 2420 days ago

BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.

link

turnersr 2420 days ago

What methods make BERT outdated? Do you have pointers to other options?

link

bitL 2420 days ago

e.g. XLNet:

https://arxiv.org/abs/1906.08237

link

phreeza 2420 days ago

XLnet is Bert with a bunch of additional training tricks.

link

bitL 2420 days ago

BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...

link