Hacker News new | ask | show | jobs
by bmistree 2461 days ago
I just wanted to piggy-back onto the parent’s comment with a concrete example.

I’ve always been told that it’s good practice to take periodic backups. In the absolute worst cases, you can simply restore directly from these.

If a customer requests that their data are deleted, in addition to my production instance, does that mean that I have to remove their data from my backups? If so, I’m uncertain of the best way to do this. I’m uncertain if many managed services will allow me to mutate backups. And even if I were managing my database and backups directly, it seems painful to load each backed up database, remove the data, and rewrite the backup.

Note: I’m not saying that any of this is impossible. However, it does require a lot of ancillary engineering work difficult for a small company that’s just trying to get to product market fit.

2 comments

Not sure about the legal framework in the US but over here across the pond, it's enough if you remove the data when restoring the backups (reasonably easy to do; took me about a day to implement that on an old codebase that I wrote more than ten years ago, and I haven't touched either PHP or that codebase since then...).

IANAL but the guy who told us how it's done was, and in addition to all the legal stuff, of which I have absolutely no recollection because I don't really understand it, he pointed us to this as a useful resource for people who are also not lawyers: https://ico.org.uk/for-organisations/guide-to-data-protectio... .

Turns out it's acceptable for data to remain backed up for a while (as long as you inform your users), as long as you have systems in place that guarantees it's not used anymore.

Just sayin, it's not rocket science. Reading Internet forums you'd think the GDPR was like Apocalypse Lite, but in my experience, it took very little effort to implement it for companies that weren't engaging in shady practices.

>Not sure about the legal framework in the US but over here across the pond, it's enough if you remove the data when restoring the backups

Implementation-wise, is the best approach to do this to store some token for "user XX requested YY data be deleted" and check those tokens whenever you restore a backup?

I feel like that'd run befoul of a true solution because, in the event of a leak, it could be used to tie the information in the backup to the user who requested their data be deleted. Or am I misunderstanding such that that'd actually be acceptable under GDPR?

Is there a better way to do it?

That's pretty close to what I did, except I didn't reference the user who requested removal -- it's just a token that says "data will be removed due to GDPR request". I think there's a requirement to log removal requests, so there are still dots that can be connected, though.

Also, I don't know if it's the best technical approach -- I did just because it's code that I wrote a very long time ago, for a friend who was just starting their business. I took care of it because we're still friends and he asked me if I could take a look at it, but it's the first time I've done backend/web development in more than 12 years now.

I think this is sufficient, even considering things like the potential for data breaches. It complies with both the explicit requirements and the general spirit of the GDPR. IANAL and all but I think that, since data leaks aren't a form of data processing by the company who collected the information, they are outside the scope of Art. 17. There are already requirements in place about the secure storage and administration of personal data.

Plus, if you think of it, the framework of this whole construction provides sufficient assurance. If you have live, online backups which can be restored immediately, with a single click, then it's clearly not a problem to erase data from them immediately. If you have offline backups, you're required to have a retention policy for them anyway, and you can't process data from them anyway -- not until they're restored and you've had a chance to purge them. It's certainly possible that someone might break into your storage unit and run away with your archive tapes or hard drives or whatever, but at that point there's a lot of legislation that you have to worry about having broken before you even get to the damn GDPR :).

For GDPR it's sufficient to inform about the backups and when they are expected to be deleted. If you restore them you must have procedures in place to delete the requested data. I don't think this is any different.