Hacker News new | ask | show | jobs
by throwaway98237 3495 days ago
If you watch an old tech talk re: Google disaster recovery topic, the Googler explains that a user's info is never fully deleted because it's too expensive to do so given back-up duplication is processed multiple times in multiple locations and sometimes over multiple technologies. In other words, cancel your google account today and your data may be "deleted" but it's really, as in actually really, still on magnetic back-up in several locations, but it's just really hard to get to and put back together, so is considered "gone". Unless you're a really big organization with the means to go to such troubles, like maybe the government or Google.
7 comments

It would be easy to setup a backup system where all backups are encrypted with a set of randomly generated keys. (1 key for each user/service pair or something). The keys are going to be (relatively) tiny, so they could be kept on non-archival storage.

If a user's data needs to be deleted for whatever reason, simply discard the user's corresponding encryption keys. That way you can effectively wipe the user's archive without needing to touch the tapes themselves.

How is that compatible with legal requirements for deletion? I think that, at least in Germany, you can demand that a company delete all data associated with you, and the company has to comply with it.

(Whether German data protection law applies in Google's datacenters is a wholy different story though.)

It's not required in Germany either. The data protection law allows to "lock" data instead of really deleting it from all devices. And IMO that's the only sane solution to deletion requests.

Otherwise, any kind of backups would be unlawful for a company.

Though this requirement is moot even in Germany, as it is impossible for you to verify such a deletion.
Can you point me to that tech talk? I think this info is outdated, at least it conflicts with what I believe to know.
I agree, AIUI this is no longer the case.
<Blah blah disclaimer IANAL & speak for myself>

I think this may have changed since that talk. I think Google still doesn't technically guarantee full deletion (who knows if someone's GC process messed up or has a bug) but in practice it happens, at least AFAIK. It is expensive. And it does take time, I wouldn't expect all my stuff to be purged until at least 180 days (the 90 days they are supposed to delete after + ~90d for the delete to fully propogate).

It also drives every engineer nuts when they're asked is your service wipeout compliant and they realize omg I can't store this data longer than x days?! shit shit shit

European Privacy Shield compliance is yet another bag of legal worms every service has to deal with. More deleting, encryption at rest, etc.

Huh. I now get the focus on pure-HDD storage as opposed to using tapes.
Things change
From watching a tech talk a few years ago by Googler on backup, my understanding is that they just delete the encryption key (I believe everyone's data is encrypted with a per user key).

Your data may still live out on some server - but it is effectively unrecoverable.

I also remember that talk. I haven't verified by watching again, but if anyone's interested I suspect this is the one: https://youtu.be/eNliOm9NtCM