Hacker News new | ask | show | jobs
by jordn 1538 days ago
I have respect for Andrew Gelman, but this is a bad take.

1. This is presented as humans hard coding answers to the prompts. No way is that the full picture. If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.

2. What is actually happening is far more interesting and useful. I believe that OpenAI are using the InstructGPT algo (RL on top of the trained model) to improve the general model based on human preferences.

3. 40 people is a very poor army.

6 comments

>This is presented as humans hard coding answers to the prompts. No way is that the full picture. If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.

It's presented as humans hard coding answers to some specific prompts.

I feel like this is mostly people reactign to the title instead of the entire post. The author's point is:

>In some sense this is all fine, it’s a sort of meta-learning where the components of the system include testers such as Gary Smith and those 40 contractors they hired through Upwork and ScaleAI. They can fix thousands of queries a day.

>On the other hand, there does seem something funny about GPT-3 presents this shiny surface where you can send it any query and it gives you an answer, but under the hood there are a bunch of freelancers busily checking all the responses and rewriting them to make the computer look smart.

>It’s kinda like if someone were showing off some fancy car engine but the vehicle is actually being powered by some hidden hamster wheels. The organization of the process is itself impressive, but it’s not quite what is advertised.

>To be fair, OpenAI does state that “InstructGPT is then further fine-tuned on a dataset labeled by human labelers.” But this still seems misleading to me. It’s not just that the algorithm is fine-tuned on the dataset. It seems that these freelancers are being hired specifically to rewrite the output.

> If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.

This is discussed:

>> Smith first tried this out:

>> Should I start a campfire with a match or a bat?

>> And here was GPT-3’s response, which is pretty bad if you want an answer but kinda ok if you’re expecting the output of an autoregressive language model:

>> There is no definitive answer to this question, as it depends on the situation.

>> The next day, Smith tried again:

>> Should I start a campfire with a match or a bat?

>> And here’s what GPT-3 did this time:

>> You should start a campfire with a match.

>> Smith continues:

>> GPT-3’s reliance on labelers is confirmed by slight changes in the questions; for example,

>> Gary: Is it better to use a box or a match to start a fire?

>> GPT-3, March 19: There is no definitive answer to this question. It depends on a number of factors, including the type of wood you are trying to burn and the conditions of the environment.

> This is presented as humans hard coding answers to the prompts. No way is that the full picture...

This is something of a misrepresentation of what is being proposed here, which is actually essentially what you suggest: "OpenAI are using the InstructGPT algo (RL on top of the trained model) to improve the general model based on human preferences."

One of the things that makes GPT-3 intriguing and impressive is its generality. InstructGPT is the antithesis of that - its purpose is to introduce highly targeted influences on GPT-3's output in specific cases and sometimes ones very similar - and its use improves the output at the cost of diminishing the performance. Furthermore, if the output is being polished in cases like those presented here, that would impede a frank assessment of its capabilities.

It depends what stage you hardcode. Similarly to how you can say "ok Google, what time is it" in any voice and get a different time every run; the speech recognition is not hardcoded, the speaking the time is not hardcoded, but the action is.

Likewise, they can plug holes here in there by manually tweaking answers. The fact that it's not an exact-prompt-to-exact-result rule doesn't make it less of a fixed rule.

It makes sense for GPT-3 to thoroughly explore a search space only after repeated and similar questions.

The answers to, "Why did Will Smith slap Chris Rock?" will be much different five seconds after the event compared to five days after. Of course you would expect the Academy Awards to be part of the answer five days later, because practically every news article would mention the venue.

Going even further, a simple (undergrad-level) language model would detect the nominative and accusative, so you might even get a correction as an answer if you ask, "Why did Chris Rock slap Will Smith?"

Seven thousand people might ask this same question, while nobody wonders what the best rugby ball chili recipe is. GPT-3 will never try to organically link those ideas unless people start asking!

I'd even venture that negative follow-up feedback is factored in. If your first reaction to an answer is, "That was WRONG, idiot!" this is useful info!

Then again, if a negative feedback function exists, adding a human to the loop should be simple (and effective).

-----

Is 40 a weak army? It depends on whether they are classifying questions randomly/sequentially or if they hammer away at the weakest points... grading Q/A pairs (pass/fail) based on a mix of high question importance and strong uncertainty of the answer.

I agree. I suppose as an outsider learning about AI, first thoughts might be “wow look at all the things it can’t do”. But as someone who follows closely all I notice is how rapidly the list of things it can’t do is shrinking.