Hacker News new | ask | show | jobs
by boredatwork 3029 days ago
Sure I delete files that I don't like, but I don't typically rewrite all my old backups to purge them from there too.
3 comments

This is my biggest question about HIPAA and GDPR about deleting specific user records and data.

How are others planning on deleting data from all backups. It seems like any automatic process that modifies all existing backups has the potential to accidentally corrupt all backups in the process.

Is there any safe way to safely delete a record out of my prior database snapshots, or is there a reason I don't actually need to do this?

Encrypt the data element using a nonce, encrypt the nonce using a public key whose private key will be purged from your HSM/SCD/key management system on a scheduled basis. You will need to retain metadata about the key ID too.

Don’t leak private keys, so you should generally use a decryption service if you need access to the data record. Handy to prove access too!

That works and survives fairly intense audits at least in my experience.

Do you maintain your database backups indefinitely? If they rotate out after a month or so you will likely be inside the realm of what GDPR considers reasonable compliance. The live data is removed ASAP and the data will rotate out from the backups in a reasonable time frame. At least from the legal advice we've had.

We have no plans to retroactively fix our backups. But we will have to make damn sure that if we need to use a database backup we do not reintroduce user data that we've purged. For that purpose we will have to maintain a list of which users have been purged until the backups rotate out. According to the advice we've had, this is acceptable.

> But we will have to make damn sure that if we need to use a database backup we do not reintroduce user data that we've purged. For that purpose we will have to maintain a list of which users have been purged until the backups rotate out.

This is the approach we've generally taken as well.

This has been the insurmountable issue for us, thus far.
What if you're using an append-only log, like Kafka as your data backbone?
Two options: a) use a reasonably short retention period, e.g. 6 weeks b) if you key your entries in Kafka, a new entry with the same key will overwrite the old one. That way you could overwrite PII content with an empty message.
You got a big problem.
Once you have a way to backup data per user, just encrypt them with random key and once the account is deleted, delete the random key. We have done it this way, with having multiple live copies of backup key table on multiple locations and beeing backed up daily purging previous backup. The hard thing was to group the data in a way where we can encrypt them with users random key.

I hope I was helpful :)

How do you manage/keep the keys?
A file does not really exist unless it is backuped, and it's not really deleted unless it is not backuped.