Hacker News new | ask | show | jobs
by dalben 1128 days ago
I am a SWE (IANAL) with a post-grad degree in GDPR/DPO, and while I had only time for a cursory read, I must say it hits a lot of nails on the head! A breath of fresh air in times of so much GDPR misinformation.

From what I remember, the ePrivacy-GDPR cookie mismatch (consent as the only allowed legal basis for cookies) is due to ePrivacy being older than the GDPR and not intentional.

Article 5 (Principles) is always a good mention - just having a legal basis is not enough, you always need to respect these principles (such as lawfulness, fairness and transparency).

The dig at pseudonomyzation not being enough is great. It's a personal pet peeve of mine. Pseudonomized data is still personal data!

The GDPR does not prescribe how to anonymize data. It just says "as long as someone can identify a person, then it's personal data." For example, you might think that aggregating based on city is enough to anonymize, but my nephew was at one point the sole person living in a village - that would have directly identified him. Likewise, stripping the last octet of IP addresses might not be enough if I personally own a /24. It's all about context.

The biggest thing I personally learned, was that any solution claiming to be "GDPR proof" probably is not compliant.

3 comments

Author of the article here! (I tried to submit it but HN rejected it)

I started researching this last weekend, reading through the GDPR, the ePrivacy Directive, and tons of related court rulings (with the help of Google Translate). 2002/58/EC and EC 2016/679 is engrained into my brain now. I was so nervous releasing to the public, but I breathed a sign of relieve after reading your comment.

This is one of my pet peeves of GDPR! Your nephew and IP Octet cases are very extreme edge cases that we shouldn't build policy around if there are major drawbacks to including them. It's bad there is ostensibly no compliant way to count anonymized unique users in Europe under the current framework.
I don't think there is any way to reliably count unique users without collecting an inappropriate level o f personal data. Even tracking unique devices requires significantly undermining privacy.

This simply isn't data companies should be allowed to collect without meaningful consent.

A half-baked idea I had while reading the article was to use bloom filters:

User visits the site. On the backend, check if their IP+UA is in the bloom filter or not. If not, increase the unique visitor counter and add them to the filter.

Perhaps the filter would need to be preseeded with dummy data to protect the privacy of the first few visitors.

Not a bad idea, but IP addresses are personal data too.
I think this is a great idea
This is effectively what the “GDPR compliant” providers mentioned in the article are already doing, namely, a one-way hash of the IP+UA. One of the points of the article is that this is non compliant, since you need to transmit the IP+UA to do this calculation to begin with.
But do they store individual IP+UA hashes, or do they mush them together in a bloom filter or a HyperLogLog data structure?

In the first case, it could be argued they still store personally identifiable information (for a limited time, but still). In the second case I think it would be harder to argue the probabilistic data structure with lots of hashes mushed together still constitute personally identifiable information.

> One of the points of the article is that this is non compliant, since you need to transmit the IP+UA to do this calculation to begin with.

IP + UA gets transmitted to the first-party server already. They already have it. The question becomes – is it OK to anonymize this PII we already received for one purpose (serving the web page), to use it for another purpose also (counting unique visitors).

> IP + UA gets transmitted to the first-party server already. They already have it. The question becomes – is it OK to anonymize this PII we already received for one purpose (serving the web page), to use it for another purpose also (counting unique visitors).

Maybe I'm missing your point, but in the situation we're talking about (so-called "GDPR compliant" analytics), if I set up one of these services on my website, the user's IP+UA are transmitted to a 3rd party, for the sole purpose of analytics including counting unique visitors. My understanding is that this is quite different in the eyes of the GDPR from the question you posed, and is almost always not going to be compliant.

The most interesting fact is that there's now a postgrad degree in gdpr