Hacker News new | ask | show | jobs
by yreg 626 days ago
I have a feeling that it's also quite a difficult problem past some scale of infrastructure.

If I ask Google to delete my data (EU citizen), I have trouble believing that they actually go through all of their cold storage backups where it was stored and make sure it's erased. At best I could believe that the process is designed in such a way that my soft-deleted data is unlikely to be recovered (intentionally or not) and maybe unlikely to be possible to link to my account.

4 comments

What they should do (I have no idea what they do) is to encrypt every record belonging to a user with an individual key. Live records, backups, everything. If a user wishes to be deleted, that live key is simply obliterated, making any data the user owns unrecoverable.

Since the key is not used for end to end encryption, and backends still have access to the data (as long as the key lives), it has different requirements on how it needs to be protected. The biggest challenge is backing up the key itself, as losing it means losing access to all the user’s data by design. But backing up and obliterating a single key is much, much easier than doing so for a whole set of loosely associated data across many databases.

Practically speaking, it also makes using and querying that data and doing any kind of analytics much, much more expensive. It is done that way in some cases, but in the absence of a technical requirement to do so, there are cheaper approaches.
Those are solvable problems. I could also argue how address space separation and more generally MMU protections make things so, so much more complex (they do!), yet we don’t question that one very much.

There is no end to end encryption involved here, so you don’t need to resort to such voodoo as homomorphic encryption.

Yes, I also expect that this is the way, but I think it makes the problem only partially smaller, since you still need to sync and back up the keys.

Also, is an encrypted piece of data with a lost key truly deleted? What if the encryption gets cracked?

I would say it is more deleted than toggling a `deleted` flag in the db and less deleted than burning the tapes in fire.

> the problem only partially smaller, since you still need to sync and back up the keys.

I mentioned that: It makes the problem much smaller, as you only have one single, small piece of data to backup and and erase, instead of an ever-changing many-faceted blob of distributed data.

> Also, is an encrypted piece of data with a lost key truly deleted? What if the encryption gets cracked?

Oh boy. If simple symmetric encryption gets “cracked”, then you have much larger problems.

> I would say it is more deleted than toggling a `deleted` flag in the db and less deleted than burning the tapes in fire.

For all practical purposes symmetrically encrypted data that lost its keys is considered “random” data. If you “erase” data on a device before you sell it, most often it will just throw away the key to the disk contents nowadays.

They already do this (the encryption-at-rest part). Deleting the data is still a hard requirement. Also, the keys are never seen outside of the centralized encryption service. Deletion is still a must.
Encrypt with an individual key for each user. Throwing away the key is indistinguishable from deletion.
Before you make a deletion request, make a subject data request and see what they have on you; then request deletion; then make a subject data request again.
The fact they cannot access the data during subject data request does not mean it has been deleted.
Google-scale companies have very capable people employed, both on the technical and legal side, who do nothing else than look for these kinds of oversights, and are empowered to make sure they get fixed.
Large companies fail in spectacular ways all the time. Google is super successful because they tapped into the biggest cash cow of all times. Not because the employees are somehow very capable and above any oversight.
That's why they get fined all the time?
I can't speak for any other companies, but you don't need to speculate. You can search the internet and find several articles outlining that the correct strategy for businesses here is to delete the data from production systems, and then maintain a record of references to those deleted records such that a restored backup can ensure that deleted records are not put back into production.

There is generally an expectation that data may be retained in backups for a specified retention period, but will not be used or restored. Beyond that, it is up to the regulator to determine if this is meets the standard, but it's worth noting that there are notions baked into the text and the interpretations of the text of GDPR that account for reasonable costs and efforts.

Auditors can and do test and monitor for this, both using audit processes and demanding evidence, and by performing manual testing and experimentation.