Hacker News new | ask | show | jobs
by whimsicalism 2079 days ago
> So if there is a method that achieves the same goal with a model that is simple enough to be used by normal developers and researchers without OpenAI-scale infrastructure, that does seem buzz-worthy.

That's trivial and has already been done. GPT-3 didn't even get SOTA on SuperGlue.

These are the sort of misunderstandings that could have been avoided if the title was better.

In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea.

1 comments

> GPT-3 didn't even get SOTA on SuperGLUE.

Of course neither GPT-3 nor the PET paper claim SOTA on SuperGLUE. They used a few-shot learning setup with 32 examples per task The normal SuperGLUE setup has hundreds or thousands of examples per task [1].

> In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea.

Could you please link to some of the work you are referring to?

[1] Table 1 in https://w4ngatang.github.io/static/papers/superglue.pdf

I was sloppy in my skimming of the paper - upon closer read it does actually seem quite different than that literature I mentioned (examples: RoBERTa, XLNet). I'll be reading it more carefully, but can now better understand the comparison to GPT-3.