Hacker News new | ask | show | jobs
by mcao 2127 days ago
I am not a lawyer so I cannot say for sure what constitutes PII and what breaches GDPR. I am using the same techniques as Fathom Analytics, Plausible.io and other products. Everything is hashed into a unique session id and none of the actual data like user agent or IP address is actually stored. It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.

As for the localStorage, it's just for performance so I don't have to recompute the session hash. The product will work the same without it. But seeing as it is a cause contention I am probably going to remove it.

8 comments

Both Fathom and plausible generate a unique salt every day. By getting rid of the old salts, they've anonymized any data older than a day. From [0]:

> We do not attempt to generate a device-persistent identifier because they are considered personal data under GDPR.

> Instead, we generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.

[0] https://plausible.io/data-policy

I will probably implement the daily salt and remove the localStorage code as well just to be safe.

But again, I'm not a lawyer here, where do you draw the line? Why not hourly salts? 5 minute salts? What is considered a reasonable effort? At some point you're storing data that can identify a user for the purpose of analytics. Still, I'm going try to lean to the safer side as best I can.

There are two paths to compliance with GDPR.

Option 1: Accept that you're collecting Personal Data, and satisfy the obligations GDPR places on that. This means disclosing the use of analytics in your privacy policy (what data's being collected & why), listing retention periods, and figuring out how to satisfy requests like Access or Deletion (which may include "we can't identify you in the data we previously collected).

Option 2 is to "comply" with GDPR by finding a loophole that it technically doesn't count.

The Option 2 approach is more common when dealing with American data privacy laws. It doesn't work out so well with GDPR. It's very difficult to not be processing personal data at some point. Even if you fully anonymize your data before doing any non-trivial processing, the anonymization itself is still covered by GDPR. Which means you need to include it your privacy policy and provide opt-out.

It's also high-risk. If a court decides that you didn't quite thread the needle through the loophole in their country and GDPR therefore applies in full, then you haven't done any of the compliance groundwork.

For GDPR compliance, I would be much more inclined to trust a tool that describes how to opt users out of tracking than one that claims they're immune from obligations to opt-out.

As another commenter mentions, the ePrivacy Directive is a whole different kettle of fish. Strong consent needed to read or write any data not strictly necessary to provide the services requested by the user. That law should get updated with more sanity soon... it's been that way for a few years now.

GDPR gives you 30 days to comply with deletion requests; that’s a good starting point to ensure you don’t keep PII past the regulated cutoff.
Doesn’t using the website id in the hash mean the key is no longer PII since it can’t follow you between websites? Or is being identifiable within a single site enough the threshold?
> I am not a lawyer so I cannot say for sure what constitutes PII and what breaches GDPR

If you don't feel fit to judge whether something breaches GDPR, then maybe you shouldn't say "so it is GDPR and CCPA compliant".

Fair point. I was simply following the "common practice" from other products making these claims, which is to not store personal user data and only generate an anonymous ids.

Maybe that's not fully compliant, I don't know, so I went ahead and removed any mention of GDPR from the website. It's not really my goal anyways. I'm just trying to release free software while they are charging money and making these claims.

Thank you for removing GDPR mentions, but mostly for building this in the first place!

It looks really nice.

The IDs that you generate aren't anonymous like Plausible.io. You simply need to address that issue and you should be mostly there for GDPR compliance.
Fair point also! Great job on the product, and congrats on shipping. I immediately spotted Inter :)
An IP address is considered personally identifiable information in at least Germany. If you're storing that you'll already have to think about the GDPR.

This is just another misguided attempt to adhere to the letter of the law while going against its spirit. Is is misguided because it's based on a wrong understand of what the letter of the law actually is. You see this a lot with adtech and analytics companies who try to skirt regulations through elaborate mechanisms but ultimately in vain.

>This is just another misguided attempt to adhere to the letter of the law while going against its spirit.

It's easy to say this and hard to draw a line between PII and what I can store without consent. "yesterday I sold 5 products on my website" is not PII (I hope). If I store the timestamps for each purchase I'm already in the grey area. One could combine the timestamps with other data to identify my customers.

So, effectively, you're saying you aren't allowed to have a server that logs requests?
It's considered PII in the United States as well. PII is a very easy standard to meet.
I've listened to a podcast interview with a lawyer specializing in EU privacy laws and he said that it does not matter if the personal data is hashed or encrypted. It's still personal data. This was about data stored in a database tough, but browser local storage is a database.

This was mentioned when the guest spoke about right to be forgotten. The law is really weird, because you need to delete user's data from your database, but it's OK to keep backups.

> It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.

It can exists as long as user agrees to be tracked. There is a category of "metrics" "cookies" user needs to agree on before you can track him for metrics. That's the whole point of the law. You need user's permission.

> it does not matter if the personal data is hashed or encrypted

That sounds odd. If there is no way to go back from the hash to the data it is no different from a random string of letters and numbers.

It’s different because it allows reidentification. It prevents you from coming up with an IP or what have you out of thin air, but you or another party you give it to can effectively use it as a perfect proxy of whatever you hashed.
Let’s take a hashed IP address.There are 4.3B ipv4 addresses. So a few minutes on an old laptop to generate a rainbow table. With decent hardware it would be seconds. The rainbow table could then be used to identify all the IPs you store. If they are salted, then each IP would need to be brute forced, but still only seconds on good hardware
That would still take collaborative data from another dataset outside this product. Compliance would be up to whoever hosted this, and the collaborative data set to comply with the request anyway.
Did you remember when an old data set from AOL was released where the user id had been pseudonymised by some hashing?

The users could be re-identified just by their behavior.

Without correlating data it really isn't "personal" though. You could delete the User account and related without touching this product and you've complied because this data could then never be correlated. Also, if nothing in the activities leaks the user's own identity, then again wouldn't really be personal.

IANAL

If you don't want to get dragged into a lawsuit when a user gets sued on a GDPR claim, you probably shouldn't make any statements about your product's GDPR compliance. Stick to the facts about how your product works, and leave the legal speculation to the lawyers.
"In the strictest interpretation of GDPR, I don't think any analytics product can exist." That's the point. Unless you aggregate the data.

Besides, it's not only GDPR you should consider, but also the latest cookie verdict by the CJEU. You need a consent if you drop cookies, session storage or any other tracking technology, no matter if you process personal data or not.

Maybe this might help you, it is roughly 2 hours long but it is as far I am concerned the best explanation of GDPR I have ever seen, done in mostly non legal speech. Actually it is fun to watch (part about borrowing a car is hillarious):

https://www.youtube.com/watch?v=-stjktAu-7k

It doesn't matter if the UA or IP is stored, even using them to fingerprint a user requires GDPR consent.
Consent is only one potential basis for processing under GDPR. There are others such as "legitimate interest" which the controller and/or processor may rely on.
Since this is about cookies and IP addresses, GDPR is not the most relevant EU law. Instead, we have to look at the old ePrivacy Directive.

For cookies or any other access to information stored on the user's device, that access must either be strictly necessary for performing the service explicitly requested by the user, or consent is required (ePD Art 5.3). This is where those annoying cookie banners come from. LocalStorage isn't any different and would require the same consent as cookies.

For traffic data such as IP addresses, processing is allowed if it's technically necessary for the “transmission”, if the data has been anonymized, if it's required for billing purposes, or if the user has consented (ePD Art 6). There is an argument that security logs might be necessary, other uses like analytics are more dubious. The good news is that Umami seems to properly anonymize the IP address, so this part seems fine.

In cases where ePD mandates using consent, we cannot fall back to another GDPR legal basis such as legitimate interest. Of course this discrepancy between ePD and GDPR is a huge problem, and the promised ePD update has yet to materialize.

That's true but not relevant for a random user visiting a website.
Users have the right to object to Legitimate Interest too. A vendor just declaring LI as a Legal Basis for processing isn't enough (legally).
Is there any legal precedent for whether analytics constitute a legitimate interest?