Hacker News new | ask | show | jobs
by 6gvONxR4sf7o 1705 days ago
The reaction in this thread is really interesting, in comparison between this and open-ai’s announcements. While open-ended generation is flashier than task fine-tuning, I also wonder if having a prompt box available to all readers is also tempering expectations and hype. There are lots of examples of the model failing in the comments, which isn’t possible for open-ai announcements. Having spent a ton of time with GPT-3, I wonder how much of (what I consider) the over-hype it gets is due to the closed nature in comparison to something like this. The reaction to this one seems decidedly more realistic.
3 comments

(author here) That's an interesting take (which I agree with).

Providing a quick way to stress test the model is definitely a double edge sword. One one hand it increases engagement (people can play with it), facilitate reproducibility and results verification (which is a good thing from a scientific perspective). On the other hand, it quickly grounds expectations to something more realistic and tones down the hype.

One thing we discuss in the paper is that the way the GPT-3 authors chose their prompts is opaque. Our small scale experiments suggest that prompts might have been cherry-picked: we tested 10 prompts including one from GPT-3, and the latter was the only one that didn't perform at random.

Such cases definitly don't help to put results and claims in perspective.

> Providing a quick way to stress test the model is definitely a double edge sword.

I hope you don’t second guess or regret the choice to make the announcement so accessible. It’s a really good thing to have scientific communication accurate and accessible, especially when those two things go together.

As someone who wrote a post on tempering expectations with GPT-3 (https://news.ycombinator.com/item?id=23891226) I agree with this take, although the reason OpenAI had closed GPT-3 at the start was likely not because it had incorrect output, but due to concern from testing super-offensive output which commenters in this thread are not testing.

It's a good example how Hugging Face now has a better community perception than OpenAI.

Great observation. Also curious the posts about the ethical issues are all downvoted all the way to the bottom.
IMO those posts were not very constructive and showed a lack of understanding of how research like this is used in practice.