I work at Microsoft Research in the UK. A few weeks ago we had a lecture from a lawyer on exactly this subject. Her main point was that GDPR gives people the right to request their data be deleted but it gives companies the right to refuse if it would cause unreasonable damage to their business. Until a case makes its way through all levels of the court system, nobody knows how this collision of rights will be interpreted.
I suspect someone would have to show that the model trained on their data revealed something about them in a practically harmful way.
> A few weeks ago we had a lecture from a lawyer on exactly this subject. Her main point was that GDPR gives people the right to request their data be deleted but it gives companies the right to refuse if it would cause unreasonable damage to their business.
I guess it still needs to be litigated, but the question on my mind is: Does that right of refusal only apply to the model, or also the data that trained it. If it applies to the data, the regulation is pretty useless, since anyone could avoid the deletion requirements by training models on it, if it doesn't I think the use in the model takes care of itself. At some point they'll need to retrain, and then you're data won't be there.
I feel though that the point of the GDPR was to protect our personal data held by companoes, not to prevent companies using our personal data to make money.
So if a company uses your personal data to train a model (lets assume you willingly gave your informed consent for the time being), and then they delete your data after they have trained their model, does that model contain your personally identifiable inbformation? I'd argue that it does not - the model is just some weights, right? So 0.6 34.291, 0.0016 - is that you, mum?
.... but having just said that, I do wonder what happens if you run the model in reverse, like the deepdream stuff did (1). Could it re-generate PII (or rather generate "nearly-PII") purely from those weights?
Can you prove it is? Many many things can be personally identifiable given enough resources and associated data, so it's unclear whose burden of proof it is, especially considering sibling comment mentioning a GDPR exception to delete data if it causes sizable damage to a business.
I suspect someone would have to show that the model trained on their data revealed something about them in a practically harmful way.