Hacker News new | ask | show | jobs
by msamwald 2078 days ago
> GPT-3 didn't even get SOTA on SuperGLUE.

Of course neither GPT-3 nor the PET paper claim SOTA on SuperGLUE. They used a few-shot learning setup with 32 examples per task The normal SuperGLUE setup has hundreds or thousands of examples per task [1].

> In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea.

Could you please link to some of the work you are referring to?

[1] Table 1 in https://w4ngatang.github.io/static/papers/superglue.pdf

1 comments

I was sloppy in my skimming of the paper - upon closer read it does actually seem quite different than that literature I mentioned (examples: RoBERTa, XLNet). I'll be reading it more carefully, but can now better understand the comparison to GPT-3.