|
|
|
|
|
by Jack000
1134 days ago
|
|
This type of data is actually better than independent human text (specifically for training the LLM that originally produced the output) GPT4 is trained with PPO+RLHF. The web text that is produced by the LLM then fed back in will be more proximal to the original token distribution. In other words, by selectively publishing LLM output you’re effectively performing the same action as clicking the thumbs up/thumbs down button on the chatgpt webui. I agree with openai that this will not be a problem at all, since you would need a process to gauge the quality of the data anyways, even for human text. |
|