|
|
|
|
|
by 6gvONxR4sf7o
1705 days ago
|
|
The reaction in this thread is really interesting, in comparison between this and open-ai’s announcements. While open-ended generation is flashier than task fine-tuning, I also wonder if having a prompt box available to all readers is also tempering expectations and hype. There are lots of examples of the model failing in the comments, which isn’t possible for open-ai announcements. Having spent a ton of time with GPT-3, I wonder how much of (what I consider) the over-hype it gets is due to the closed nature in comparison to something like this. The reaction to this one seems decidedly more realistic. |
|
Providing a quick way to stress test the model is definitely a double edge sword. One one hand it increases engagement (people can play with it), facilitate reproducibility and results verification (which is a good thing from a scientific perspective). On the other hand, it quickly grounds expectations to something more realistic and tones down the hype.
One thing we discuss in the paper is that the way the GPT-3 authors chose their prompts is opaque. Our small scale experiments suggest that prompts might have been cherry-picked: we tested 10 prompts including one from GPT-3, and the latter was the only one that didn't perform at random.
Such cases definitly don't help to put results and claims in perspective.