| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by buboard 2420 days ago
	The article starts with NLP models and then mentions the successes of increasingly smaller vision models. NLP seems to be an outlier in increasingly becoming a pissing contest. The models are too big and not particularly useful. openAI spread FUD about their model but after their release , it's rather underwhelming. Yeah you can output some text that's readable and paraphrasing reddit, but what about understanding , intention, doing actual useful stuff with text? Hallucinating text in itself isn't interesting. It seems this line of nlp with transformers has hit some kind of deadend and they are trying to brute force the next breakthrough - doubtful that this will happen though. And then we have bizarre decisions like microsoft releasing dialoGPT yesterday without including a generaiton script because "it might be racist". This whole seems more like marketing than research

3 comments

Al-Khwarizmi 2420 days ago

Large transformer-based models like BERT and its ilk are not only useful to hallucinate text. They have achieved measurable improvements in various (although not all) classic NLP tasks, such as parsing, entailment recognition or question answering. Google has reportedly used BERT to improve their search algorithm, so indeed it's being used to do "actual useful stuff with text".

It pains me to say this, as I'm a researcher from an institution without the huge resources of the big tech companies, so I can't compete in the pretrained model arms race (and also, it has made the field more boring, as creative solutions to problems become outperformed by approaches that just pile up more millions of parameters). But it's the truth. Although I think it will only be a stage of things: at some point, performance will plateau and we will need to put our minds to work again, rather than our GPUs.

link

buboard 2420 days ago

google seemed to make a genuine effort to make a model that is useful rather than record-breaking with bert. But i think it's wrong to consider it the "final" model upon which everything else will be built.

link

bitL 2420 days ago

BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.

link

turnersr 2420 days ago

What methods make BERT outdated? Do you have pointers to other options?

link

bitL 2420 days ago

e.g. XLNet:

https://arxiv.org/abs/1906.08237

link

phreeza 2420 days ago

XLnet is Bert with a bunch of additional training tricks.

link

ivalm 2420 days ago

As someone who was able to generate a model for production based on BERT that outperformed all our previous attempts, I have to say transformers really are a game changer. They are not the end all be all, but they are really, really good as being the basis of many different classification tasks.

link

hnaccy 2420 days ago

Any tips in terms of taking BERT style model to production?

link

joshvm 2420 days ago

There is at least one simple reason for obsessing over efficiency for computer vision models. It takes a lot of bandwidth to transmit an image (even a small one) over the air, whereas text is cheap.

A picture may be with a thousand words, but you can send an entire book in the same amount of space as a single holiday snap at low resolution.

link