Hacker News new | ask | show | jobs
by d_burfoot 2197 days ago
In NLP there is a very clear and powerful new paradigm: train a HUGE language model using vast amounts of raw text. Then to solve the problem of interest, either fine-tune the model by training on your specific dataset (usually quite small), or 0/1-shot the learning somehow.

The crucial question is : is this paradigm viable for OTHER types of data?

My hypothesis is YES. If you train a HUGE image model using vast quantities of raw images, you will then be able to REUSE that model to work for specific computer vision problems, either by fine-tuning or 0/1-shotting.

I'm especially optimistic that this paradigm will work for image streams from autonomous vehicles. Classic supervised learning has proved to be difficult if not impossible to get to work for AV vision, so the new paradigm could be a game-changer.

2 comments

> My hypothesis is YES. If you train a HUGE image model using vast quantities of raw images, you will then be able to REUSE that model to work for specific computer vision problems, either by fine-tuning or 0/1-shotting.

This has been demonstrated for many years, it's not news. Many of the SOTAs like BiT require pretraining on JFT-300M, or Instagram, or what have you.

The pretraining approach was used in vision for years before it was successful in NLP.
Not really on unsupervised/self-supervised data though, right?

(nor on the same scale of corpora, as far as I can tell)