Hacker News new | ask | show | jobs
by confounded 3037 days ago
Of all the wonderful things that we're capable of as technologists, I think we can figure out a way to strip raw-IP addresses from log-files once we don't need them any more.

I'll need to figure out to handle this on the data I'm responsible for at the moment. It's boring and it doesn't help the product, but it's not supposed to. In idlewords' terms, I feel like I'm finally purging toxic waste: http://idlewords.com/talks/haunted_by_data.htm

2 comments

It's left ambiguous, but it's likely that any aggregate computed from personal data may also be considered personal data (i.e. how many unique IPs you've seen).
If you are looking to derive aggregated insights from data then you need to be clear on your anonymisation processes and understand whether or not you any derived dataset is capable of identifying individuals whether in isolation or through reasonable means. To me if you are taking a tally of the volume of unique IPs alone that would never be sufficient to identify a person but maybe I don't have the full context?
What you're saying makes sense. Any data derived from PII should considered as PII itself if it can be used to identify users, and even if it cannot be used for that, it needs to be cleared frequently enough such that you don't end up with data derived from information for which you received a deletion request, for instance.

In practice, you can achieve this by simply refreshing your derived data frequently (ever ~30-60 days), and for aggregated data k-anonymity is a good way to enforce this privacy constraint.

https://en.wikipedia.org/wiki/K-anonymity

You need the IP records for jurisdictions that require long term retention for law enforcement requests including copyright infringement.

So you must delete them and also keep them.

Do you know which jurisdictions and laws that includes?

This sounds like the Investigatory Powers Act in the UK, though I haven't heard of similar laws in other liberal democracies.