Hacker News new | ask | show | jobs
by ftxbro 1171 days ago
The article says "now in the wild after being leaked" but then it says "the data is impossible to retrieve as it is now stored on the servers belonging to OpenAI." So did the source code leak out of OpenAI into the wild, or are they saying that OpenAI itself is "the wild"? As far as I see from the article, it's not accessible to the general public.
2 comments

>So did the source code leak out of OpenAI into the wild, or are they saying that OpenAI itself is "the wild"?

The second one. OpenAI is now in possession of Samsung trade secrets. To Samsung, that's "in the wild". And that's a reasonable viewpoint - OpenAI could easily leak chat logs, overfit future models on this data etc, and there's nothing Samsung can now do about it.

If ChatGPT is trained on the data and ChatGPT is accessible to the general public, then the data may as well be accessible to the general public
it seems unlikely to me that ChatGPT is directly trained on chat data. if it is, we should see it know information past its knowledge cutoff. afaik that hasn't happened.

I assume the chat logs are instead training a reward model, which itself is then used as the reward function during RLHF training.

These models have a very long lead time before they’re released to the public. Maybe GPT-5 is being trained on ChatGPT logs. I’m not sure we’d be able to detect if this was happening.