Hacker News new | ask | show | jobs
by labelbias 2158 days ago
The main thing about GPT-3 is that they wanted to demonstrate one-shot fine-tuning and succeeded at it.

So the model can be transformed to output part-of-speech words, dependency grammar trees or named entities in input even if training data is sparse. Similarily, you could fine tune it to produce game lore and then see how it works for that. The model easily switches to different modes of operation and achieves state-of-the-art or close to state-of-the-art performance.

It's quite funny how NLP folks tried to solve low level tasks (POS tagging, NER, Named entity relationship extraction, dependency parsing, sentiment classification etc.) to get to higher level tasks (good summarization, machine translation, text generation, question & answering) and now a single model captures all the low level stuff for free and does high level stuff so good that finetuning it to do low level stuff is unnecessary.

2 comments

This, the difference between one-shot fine-tuning, vs fine tuning for GPT-2, is one of the major breakthroughs. Since GPT-3 is so hot in the past few days, people seem to forgot or not realize lots of the GPT-3 examples shown off today were possible with GPT-2, with the catch that you had to fine-tune your own GPT-2 model to fit your problem domain (game plots, poems, music, bots that chats like certain characters, etc). GPT-3 makes that fine tuning process unnecessary (although practically you probably can't/can't afford to fine-tune your GPT-3 model)
Are you sure that set out to proof one-shot works? Maybe they found fine-tuning performance disappointing and decided to publish this instead.