|
|
|
|
|
by patelajay285
584 days ago
|
|
We found the same result a few years ago in our ICLR paper: https://arxiv.org/pdf/2209.14500 We found Google's T5 models which were released in 2019, pre-GPT-3, were "secretly" capable of in-context learning with a simple inference technique. Given they use a bidirectional MLM (Masked Language Modeling) objective, it wasn't obvious how to do it, but MLM objectives are known to produce better language representations than causal (next token prediction) objectives. We were able to outperform much larger sized GPT-3 models or get very close to their performance with far smaller T5 models. |
|