|
|
|
|
|
by d_burfoot
2197 days ago
|
|
In NLP there is a very clear and powerful new paradigm: train a HUGE language model using vast amounts of raw text. Then to solve the problem of interest, either fine-tune the model by training on your specific dataset (usually quite small), or 0/1-shot the learning somehow. The crucial question is : is this paradigm viable for OTHER types of data? My hypothesis is YES. If you train a HUGE image model using vast quantities of raw images, you will then be able to REUSE that model to work for specific computer vision problems, either by fine-tuning or 0/1-shotting. I'm especially optimistic that this paradigm will work for image streams from autonomous vehicles. Classic supervised learning has proved to be difficult if not impossible to get to work for AV vision, so the new paradigm could be a game-changer. |
|
This has been demonstrated for many years, it's not news. Many of the SOTAs like BiT require pretraining on JFT-300M, or Instagram, or what have you.