|
|
|
|
|
by marginalia_nu
1215 days ago
|
|
In chess there is a very clear victory state, and a scoring function can be implicitly defined from a large number of games of various skill levels. You really don't throw two sentences into the thunder dome to decide which one "wins". Means it's much more susceptible to being poisoned. |
|
That's almost literally what RLHF is though, and that is the last step of training GPT-n. Then when GPT-{n+1} is being trained, it will include some results from GPT-n, and therefore will benefit from that finetuning, even before it goes through its own round of RLHF. Also, on average good outputs of GPT-n are more likely to be included in the training set of GPT-{n+1} (because it ends up as a buzzfeed article or a top post on reddit or something), so there is an additional signal beyond the above.