it seems unlikely to me that ChatGPT is directly trained on chat data. if it is, we should see it know information past its knowledge cutoff. afaik that hasn't happened.
I assume the chat logs are instead training a reward model, which itself is then used as the reward function during RLHF training.
These models have a very long lead time before they’re released to the public. Maybe GPT-5 is being trained on ChatGPT logs. I’m not sure we’d be able to detect if this was happening.
I assume the chat logs are instead training a reward model, which itself is then used as the reward function during RLHF training.