Hacker News new | ask | show | jobs
by gtirloni 1206 days ago
The training data could include internal docs that describe how it ignores or not the robots.txt file.
1 comments

If I were involved at OpenAI, I would not include the internal wiki, Slack archives, Dropbox folders, etc in the training data. While it would be highly entertaining, it would not be a good idea.
I agree on that - that private data (in a best case scenario) should not and would not be included in the training but there would be some parts of internal documents which would be public (lets say public website) - It is expected that chatGPT would know at least those ..