| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sandGorgon 1185 days ago
	but isnt that a direct contradiction of this particular paper anyways - that chatgpt anyways outperforms human annotation. so permit me to act as devil's advocate to your statement - prove that (in context of this paper), your hypothesis is still correct.

3 comments

redox99 1185 days ago

Here it's outperforming because ChatGPT is already good at these tasks (and the MTurks aren't very good, OpenAI labelers are probably better, and a panel of experts much better).

To further improve ChatGPT shortcomings (assuming such flaws are because of alignment and not lack of capability of the base model) you need Human labels. Feeding it's own outputs would achieve nothing.

However feeding it's outputs can make a non aligned model become aligned (that's what alpaca did with llama+chatgpt).

link

sandGorgon 1185 days ago

thanks for your answer. thats a reasonable point - but would we be at a tipping point by GPT5/6 (chatgpt is gpt 3.5) where human alignment is not needed?

in fact, my question is reinforced by the GPT-4 technical report which explicitly mentioned that RLHF did NOT make a change to performance (and was only used for safety purposes)

link

redox99 1184 days ago

GPT6 or whatever will always require alignment, as the base model just blindly predicts next token, instead of being a helpful, chat style assistant.

Right now the best way to align it is with RLHF. The specific technique might change, but in the end there will always be at some level some human input that tells it how it should behave. Newer techniques might further leverage LLMs and require fewer human input.

Could you use GPT4 to align GPT6? Yes. But you should expect GPT6 to inherit the alignment of GPT4, i.e if RLHF taught GPT4 that it it's OK to roast Trump, but not Biden, you would expect such GPT6 to act the same way.

Having said that, I'm sure there will interesting ways in which GPTn will help train GPTn+1. Some kind of self play in which it reasons and further improves itself seems obvious long term.

But human input that tells it "this is politically correct, this is not, so don't say that" will always be required as it's subjective. You can reuse it of course, but I don't see how it would "improve" without further human input.

link

famouswaffles 1184 days ago

you don't need humans in the loop for alignment. rlaif is a thing and is used for the anthropic models (claude)

link

sandGorgon 1184 days ago

is it really being used for the final model ? i know they have research papers out on it...but wasnt sure if the production models used it.

link

famouswaffles 1184 days ago

Yeah it is.

link

CGamesPlay 1185 days ago

The researchers in the paper used human-curated results to classify the accuracy of the GPT results. So it had that human in the loop.

link

alophawen 1185 days ago

Why would I need to prove anything. The unproved claim is in the paper.

link