Hacker News new | ask | show | jobs
by blagie 1420 days ago
The logs part made no sense, at least as I've always seen GDPR interpreted. It depends on what goes in the logs.

If logs were exempt, it'd be really easy to just ignore GDPR by sticking everything in logs.

There is no magical GDPR fairy that prevents you from needing to comply with deletion requests because you've made your data formats awkward and hard to track/trace.

There are nice articles about how to anonymize log files so they don't need to contain identifiable information. For example, what is generally okay is storing part of an IP. If I just store the odd digits of the IP:

1) I'm probably okay for not being able to identify individuals.

2) I can do most analytics without issues. Unless I have bazillions of visitors, the identifiers are unique.

For nitpickers: Odd digits is a dumb hash for illustrative purposes. In practice, I'd run the IP through SHA, and store just the first few bytes -- enough that visitors are unique most of the time in my log files, but not enough to be able to meaningfully map back to a person.

1 comments

SHAs of entire IPv4 space can be easily precalculated. Include a nonce, that is rotated periodically, to solve this.
It's a good idea, but the hash doesn't need to be unique or secure.

The IPv4 space is 2^32. The trick is to keep e.g. 24 bits. 2^24 gives 16M possibilities -- unless your web site is _VERY_ big, that means it's a unique ID for most visitors. If you come across an IP (e.g. a scammer), you can also backtrack.

On the other hand, mapping back, you get 2^8 options, so you can't tie back to a unique user.

A nonce is a good idea, but it's not part of the security perimeter here.