Hacker News new | ask | show | jobs
by mannykannot 1540 days ago
> This is presented as humans hard coding answers to the prompts. No way is that the full picture...

This is something of a misrepresentation of what is being proposed here, which is actually essentially what you suggest: "OpenAI are using the InstructGPT algo (RL on top of the trained model) to improve the general model based on human preferences."

One of the things that makes GPT-3 intriguing and impressive is its generality. InstructGPT is the antithesis of that - its purpose is to introduce highly targeted influences on GPT-3's output in specific cases and sometimes ones very similar - and its use improves the output at the cost of diminishing the performance. Furthermore, if the output is being polished in cases like those presented here, that would impede a frank assessment of its capabilities.