Hacker News new | ask | show | jobs
by AmericanChopper 1180 days ago
> My training data is sourced from publicly available text on the internet and does not specifically target any individual or collect personal data.

Training on publicly available data doesn’t mean that it doesn’t collect PII. Just ask it “who is <some public figure>?” to demonstrate this for yourself. I asked it about some of my colleagues and it was able to write a brief profile about them, and they’d barely qualify as public figures at all.

GDPR supposedly allows you to process public data without consent, but I’m not an expert on that specific usecase, and it seems to have plenty of grey areas. The right to be forgotten still applies though, and LLMs seem as though they would struggle with that. To me it looks like it’s probably one of the areas where GDPR is just manifestly impractical to manage, and the European courts have a habit of saying “too bad” in those situations.

1 comments

Yeah I mean, I wouldn't trust the accuracy of its answer. But I find it funny that it agrees that OpenAI should comply.
It’s clearly been programmed to give canned answers to these questions. If you interrogate it a bit further on the details of its GDPR compliance it gives the same script every time, and will make outrageous claims about what PII is, and other things like that the right to be forgotten doesn’t apply to ChatGPT.