Hacker News new | ask | show | jobs
by thih9 968 days ago
Can you explain why or link a source? I’d like to learn the details.
2 comments

Likely because the hash of an IP can easily be reversed as there are only ~2^32 IPv4 addresses.
It is not just that. Having user IP and such a hashing approach you can re-identify past sessions.
What if my hashing function has high likelihood of collisions?
Then you cannot trust the analytics
You can estimate the actual numbers based on the collision rate.

Analytics is not about absolute accuracy, it's about measuring differences; things like which pages are most popular, did traffic grow when you ran a PR campaign etc.

Do you trust analytics that doesn’t use JS? Or relies on mobile users to scroll the page before counting a hit?

It’s all a heuristic and even with high collision hashing, analytics would provide some additional insight.

https://gdpr-info.eu/art-4-gdpr/ paragraph 1:

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

This does not reference hashing, which can be an irreversible and destructive operation. As such, it can remove the “relating” part - i.e. you’ll no longer be able to use the information to relate it to an identifiable natural person.

In this context, if I define a hashing function that e.g. sums all ip address octets, what then?

A hash (whether MD5 or some SHA) on IP4-address is easily reversed.

Summing octets is non-reversable, so it seems like a good 'hash' to me (but note: you'll get a lot of collisions). And of course, IANAL.

I was answering your request for a source.

The linked article talks about identification numbers that can be used to link a person. I am not a lawyer but the article specifically refers to one person.

By that logic, if the hash you generate cannot be linked to exactly one, specific person/request - you’re in the clear. I think ;)