|
|
|
|
|
by msamwald
2078 days ago
|
|
> GPT-3 didn't even get SOTA on SuperGLUE. Of course neither GPT-3 nor the PET paper claim SOTA on SuperGLUE. They used a few-shot learning setup with 32 examples per task The normal SuperGLUE setup has hundreds or thousands of examples per task [1]. > In general, paper with "new variation of cloze pre-training task for this specific task" is a new section of the literature that is rapidly becoming sort of mundane and uninteresting because there are so many papers doing small variations of the same basic idea. Could you please link to some of the work you are referring to? [1] Table 1 in https://w4ngatang.github.io/static/papers/superglue.pdf |
|