Hacker News new | ask | show | jobs
by msamwald 2078 days ago
I think few-shot learning (or priming) is actually the main selling point of GPT-3 for most practical applications (rather than merely entertaining language generation). So if there is a method that achieves the same goal with a model that is simple enough to be used by normal developers and researchers without OpenAI-scale infrastructure, that does seem buzz-worthy.
1 comments

> So if there is a method that achieves the same goal with a model that is simple enough to be used by normal developers and researchers without OpenAI-scale infrastructure, that does seem buzz-worthy.

That's trivial and has already been done. GPT-3 didn't even get SOTA on SuperGlue.

These are the sort of misunderstandings that could have been avoided if the title was better.

In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea.

> GPT-3 didn't even get SOTA on SuperGLUE.

Of course neither GPT-3 nor the PET paper claim SOTA on SuperGLUE. They used a few-shot learning setup with 32 examples per task The normal SuperGLUE setup has hundreds or thousands of examples per task [1].

> In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea.

Could you please link to some of the work you are referring to?

[1] Table 1 in https://w4ngatang.github.io/static/papers/superglue.pdf

I was sloppy in my skimming of the paper - upon closer read it does actually seem quite different than that literature I mentioned (examples: RoBERTa, XLNet). I'll be reading it more carefully, but can now better understand the comparison to GPT-3.