Hacker News new | ask | show | jobs
by ncallaway 3029 days ago
This is my biggest question about HIPAA and GDPR about deleting specific user records and data.

How are others planning on deleting data from all backups. It seems like any automatic process that modifies all existing backups has the potential to accidentally corrupt all backups in the process.

Is there any safe way to safely delete a record out of my prior database snapshots, or is there a reason I don't actually need to do this?

4 comments

Encrypt the data element using a nonce, encrypt the nonce using a public key whose private key will be purged from your HSM/SCD/key management system on a scheduled basis. You will need to retain metadata about the key ID too.

Don’t leak private keys, so you should generally use a decryption service if you need access to the data record. Handy to prove access too!

That works and survives fairly intense audits at least in my experience.

Do you maintain your database backups indefinitely? If they rotate out after a month or so you will likely be inside the realm of what GDPR considers reasonable compliance. The live data is removed ASAP and the data will rotate out from the backups in a reasonable time frame. At least from the legal advice we've had.

We have no plans to retroactively fix our backups. But we will have to make damn sure that if we need to use a database backup we do not reintroduce user data that we've purged. For that purpose we will have to maintain a list of which users have been purged until the backups rotate out. According to the advice we've had, this is acceptable.

> But we will have to make damn sure that if we need to use a database backup we do not reintroduce user data that we've purged. For that purpose we will have to maintain a list of which users have been purged until the backups rotate out.

This is the approach we've generally taken as well.

This has been the insurmountable issue for us, thus far.
What if you're using an append-only log, like Kafka as your data backbone?
Two options: a) use a reasonably short retention period, e.g. 6 weeks b) if you key your entries in Kafka, a new entry with the same key will overwrite the old one. That way you could overwrite PII content with an empty message.
You got a big problem.