Hacker News new | ask | show | jobs
by goalonetwo 626 days ago
Good luck proving that your data was not deleted.

GDPR and CCPA etc made it easy to send a request for deletion that will most probably be a frontend gimmick. How much effort are they really going to put into going back in their backups and deleting all your entries? I'm pretty sure it must be the lowest roadmap priorities.

4 comments

The financial penalties are pretty nasty.

And it's amazing how financial liability has a way of getting things on a VP's feature radar that common sense doesn't.

The reason it was haphazardly handled prior was that there was no liability. Who cared? (legally speaking)

From working inside a T25 American retail company, I can say that we went top-to-bottom and rearchitected for traceability and hard deletes as a result of the CCPA.

I have a feeling that it's also quite a difficult problem past some scale of infrastructure.

If I ask Google to delete my data (EU citizen), I have trouble believing that they actually go through all of their cold storage backups where it was stored and make sure it's erased. At best I could believe that the process is designed in such a way that my soft-deleted data is unlikely to be recovered (intentionally or not) and maybe unlikely to be possible to link to my account.

What they should do (I have no idea what they do) is to encrypt every record belonging to a user with an individual key. Live records, backups, everything. If a user wishes to be deleted, that live key is simply obliterated, making any data the user owns unrecoverable.

Since the key is not used for end to end encryption, and backends still have access to the data (as long as the key lives), it has different requirements on how it needs to be protected. The biggest challenge is backing up the key itself, as losing it means losing access to all the user’s data by design. But backing up and obliterating a single key is much, much easier than doing so for a whole set of loosely associated data across many databases.

Practically speaking, it also makes using and querying that data and doing any kind of analytics much, much more expensive. It is done that way in some cases, but in the absence of a technical requirement to do so, there are cheaper approaches.
Those are solvable problems. I could also argue how address space separation and more generally MMU protections make things so, so much more complex (they do!), yet we don’t question that one very much.

There is no end to end encryption involved here, so you don’t need to resort to such voodoo as homomorphic encryption.

Yes, I also expect that this is the way, but I think it makes the problem only partially smaller, since you still need to sync and back up the keys.

Also, is an encrypted piece of data with a lost key truly deleted? What if the encryption gets cracked?

I would say it is more deleted than toggling a `deleted` flag in the db and less deleted than burning the tapes in fire.

> the problem only partially smaller, since you still need to sync and back up the keys.

I mentioned that: It makes the problem much smaller, as you only have one single, small piece of data to backup and and erase, instead of an ever-changing many-faceted blob of distributed data.

> Also, is an encrypted piece of data with a lost key truly deleted? What if the encryption gets cracked?

Oh boy. If simple symmetric encryption gets “cracked”, then you have much larger problems.

> I would say it is more deleted than toggling a `deleted` flag in the db and less deleted than burning the tapes in fire.

For all practical purposes symmetrically encrypted data that lost its keys is considered “random” data. If you “erase” data on a device before you sell it, most often it will just throw away the key to the disk contents nowadays.

They already do this (the encryption-at-rest part). Deleting the data is still a hard requirement. Also, the keys are never seen outside of the centralized encryption service. Deletion is still a must.
Encrypt with an individual key for each user. Throwing away the key is indistinguishable from deletion.
Before you make a deletion request, make a subject data request and see what they have on you; then request deletion; then make a subject data request again.
The fact they cannot access the data during subject data request does not mean it has been deleted.
Google-scale companies have very capable people employed, both on the technical and legal side, who do nothing else than look for these kinds of oversights, and are empowered to make sure they get fixed.
Large companies fail in spectacular ways all the time. Google is super successful because they tapped into the biggest cash cow of all times. Not because the employees are somehow very capable and above any oversight.
That's why they get fined all the time?
I can't speak for any other companies, but you don't need to speculate. You can search the internet and find several articles outlining that the correct strategy for businesses here is to delete the data from production systems, and then maintain a record of references to those deleted records such that a restored backup can ensure that deleted records are not put back into production.

There is generally an expectation that data may be retained in backups for a specified retention period, but will not be used or restored. Beyond that, it is up to the regulator to determine if this is meets the standard, but it's worth noting that there are notions baked into the text and the interpretations of the text of GDPR that account for reasonable costs and efforts.

Auditors can and do test and monitor for this, both using audit processes and demanding evidence, and by performing manual testing and experimentation.

Fines for non-compliance with GDPR regarding data of European citizens can amount to 4% of annual revenue:

  83(5) GDPR, the fine framework can be up to 20 million euros, or in the case of an undertaking, up to 4 % of their total global turnover of the preceding fiscal year, whichever is higher.
I have built systems for a lot of EU companies, and they all took GDPR compliance very seriously.

Maybe some mom-and-pop shop would bodge it, but any serious business has legal council and wisely listens to them.