Hacker News new | ask | show | jobs
by Al-Khwarizmi 2420 days ago
Large transformer-based models like BERT and its ilk are not only useful to hallucinate text. They have achieved measurable improvements in various (although not all) classic NLP tasks, such as parsing, entailment recognition or question answering. Google has reportedly used BERT to improve their search algorithm, so indeed it's being used to do "actual useful stuff with text".

It pains me to say this, as I'm a researcher from an institution without the huge resources of the big tech companies, so I can't compete in the pretrained model arms race (and also, it has made the field more boring, as creative solutions to problems become outperformed by approaches that just pile up more millions of parameters). But it's the truth. Although I think it will only be a stage of things: at some point, performance will plateau and we will need to put our minds to work again, rather than our GPUs.

1 comments

google seemed to make a genuine effort to make a model that is useful rather than record-breaking with bert. But i think it's wrong to consider it the "final" model upon which everything else will be built.
BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.
What methods make BERT outdated? Do you have pointers to other options?
XLnet is Bert with a bunch of additional training tricks.
BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...