| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marginalia_nu 1215 days ago
	In chess there is a very clear victory state, and a scoring function can be implicitly defined from a large number of games of various skill levels. You really don't throw two sentences into the thunder dome to decide which one "wins". Means it's much more susceptible to being poisoned.

2 comments

sebzim4500 1215 days ago

>You really don't throw two sentences into the thunder dome to decide which one "wins".

That's almost literally what RLHF is though, and that is the last step of training GPT-n. Then when GPT-{n+1} is being trained, it will include some results from GPT-n, and therefore will benefit from that finetuning, even before it goes through its own round of RLHF. Also, on average good outputs of GPT-n are more likely to be included in the training set of GPT-{n+1} (because it ends up as a buzzfeed article or a top post on reddit or something), so there is an additional signal beyond the above.

link

simonh 1215 days ago

I suspect the comment about the thunder dome was a reference to RLHF. On the one hand RLHF seems far superior to the kind of prompt engineering Microsoft seems to have relied on with Sydney. On the other, it's dubious that the manual selection in RLHF is really always selecting for quality, as against at least to some significant extent pandering to whatever biases or preferences the humans in the training loop might have.

link

Y_Y 1215 days ago

That not what RLHF is. In the thunderdome, as in chess, you don't need human judges or an oracle to know who's won. That makes a significant difference to the training procedure.

link

geraneum 1215 days ago

That’s correct. I have seen the above argument a lot: Using analogy as a basis for proof!

link