This is about crawling for training data by the look of things. Not sure if the CHatGPT browsing mode uses a different user-agent but most of the entries in that list look like crawlers.
I had assumed this is related to sites like chatgpt going out and searching with a specific request.
Regardless, my original question is still valid. The companies have already shown a lack of care about the data they train off of. So if ethics have already gone out the window, what is to stop them from ignoring this file if they are not already.
Regardless, my original question is still valid. The companies have already shown a lack of care about the data they train off of. So if ethics have already gone out the window, what is to stop them from ignoring this file if they are not already.