|
|
|
|
|
by whimsicalism
2079 days ago
|
|
> So if there is a method that achieves the same goal with a model that is simple enough to be used by normal developers and researchers without OpenAI-scale infrastructure, that does seem buzz-worthy. That's trivial and has already been done. GPT-3 didn't even get SOTA on SuperGlue. These are the sort of misunderstandings that could have been avoided if the title was better. In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea. |
|
Of course neither GPT-3 nor the PET paper claim SOTA on SuperGLUE. They used a few-shot learning setup with 32 examples per task The normal SuperGLUE setup has hundreds or thousands of examples per task [1].
> In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea.
Could you please link to some of the work you are referring to?
[1] Table 1 in https://w4ngatang.github.io/static/papers/superglue.pdf