Hacker News new | ask | show | jobs
by cplli 1420 days ago
IANAL, but shouldn't a data deletion request also apply to the data inside backups, even when no recovery is planned?

Edit: I am also skeptical about the logs part, I don't think logs can be a magical excuse to log everything that comes in, and should still only log "legitimate" use-cases.

1 comments

The logs part made no sense, at least as I've always seen GDPR interpreted. It depends on what goes in the logs.

If logs were exempt, it'd be really easy to just ignore GDPR by sticking everything in logs.

There is no magical GDPR fairy that prevents you from needing to comply with deletion requests because you've made your data formats awkward and hard to track/trace.

There are nice articles about how to anonymize log files so they don't need to contain identifiable information. For example, what is generally okay is storing part of an IP. If I just store the odd digits of the IP:

1) I'm probably okay for not being able to identify individuals.

2) I can do most analytics without issues. Unless I have bazillions of visitors, the identifiers are unique.

For nitpickers: Odd digits is a dumb hash for illustrative purposes. In practice, I'd run the IP through SHA, and store just the first few bytes -- enough that visitors are unique most of the time in my log files, but not enough to be able to meaningfully map back to a person.

SHAs of entire IPv4 space can be easily precalculated. Include a nonce, that is rotated periodically, to solve this.
It's a good idea, but the hash doesn't need to be unique or secure.

The IPv4 space is 2^32. The trick is to keep e.g. 24 bits. 2^24 gives 16M possibilities -- unless your web site is _VERY_ big, that means it's a unique ID for most visitors. If you come across an IP (e.g. a scammer), you can also backtrack.

On the other hand, mapping back, you get 2^8 options, so you can't tie back to a unique user.

A nonce is a good idea, but it's not part of the security perimeter here.