Hacker News new | ask | show | jobs
by shanebellone 1396 days ago
I'm happy to answer any question about what data is acquired and how it is acquired. I cannot answer every question about how my system processes the data. I built a novel algorithm that represents economic leverage.

Each hit is stored without personal data but including a salted hash representing the IP. Users are not tracked and are not assigned any type of individual identifier.

1 comments

OK, that's what I was looking for. Storing hits with any kind of identifier is considered processing of personal data under GDPR. It doesn't matter whether you have assigned the identifier or whether you use one that the user already has (IP, device ID etc). Hashing / salting the identifier does not change that if it's still unique.

The way to make it compliant is to ask permission for using the data. Or doing your analysis without any user identifiers, but that doesn't get you much useful insights.

To be clear, I need to count unique visits without using a salted IP address even when it cannot be attributed to an individual or used to track them across multiple websites?

I do have an idea that might work for this scenario. If I can calculate unique visits differently, I can drop the salted hash from the database too. I'm guessing that should be sufficient to satisfy most privacy conscious users.

I think it's fairly fundamental; to see if a visitor has accessed one page, and later on another page, you need to track that visitor somehow. So you need some kind of identifier. Even using an IP address (hashed or not) or assigning a random ID all falls under GDPR regulation. Alternative 'tricks' like link decoration could work maybe but you have to rewrite all URLs, which is very error prone. Creating cnames for every customer is another option, called cname cloaking, but it has other drawbacks and it's probably also not GDPR compliant. Would definitely be interesting if there is a solution for this problem, as I agree that there are very valid usecases for attribution, but its very hard to do as (very limited) tracking is almost mandatory to do this. You could work 'around' legislations by checking where a visitor comes from and track (only) in those regions where its allowed, and in other areas ask for permission? You will miss visitors, but you can extrapolate counters to compensate potentially.
I can count unique sessions without using or storing the IP address or an identifier. Maybe that represents a fair middle ground?

Edit: I implemented this approach. It's less accurate but removes the need for any representation of the IP address.

Cool, not sure how you did that of course, you should probably if you want to grow this do an external code audit or something like that, but great job, also how fast you change stuff!
Thanks! The core code (capture and post-processing) sits around 250 lines. It's short enough to retain which makes iterating quick and fairly painless.

Considering I'm about 2-months in, I'm happy with the progress and general direction.