Hacker News new | ask | show | jobs
by seri4l 1135 days ago
>hash the IP-address

How would that work? I can't think of any approach where getting the original IP back from the hash isn't trivial.

2 comments

you dont need to get the original ip back, just need to know how many unique ips are there, so sha(ip) is good enough
With little over 4 billion IPv4 addresses.

From a stackoverflow post from 12 years ago:

> I know I do 622 million SHA-256's per sec on a Radeon HD5830.

Which would take around 6 seconds to brute force a 32bit address space.

you can just salt it with some stable random thing
In which case you can just take 6 seconds to generate all the new hashes and build a new lookup table.
You can further add bucketing, and eventually move closer to FLoC.

But this is aside the point, as the spirit of the law only allows "processing for legitimate interests". The use of technology, cookies or on the server is irrelevant. If thread OP has evaluated their collection[0] as legitimate, they can use whatever technology within guidelines. Otherwise, even a cookie less data collection would require consent.

[0]: https://ico.org.uk/for-organisations/guide-to-data-protectio...

I apologize for the double negative. What I meant is that hashing doesn't improve privacy because if you know the hash and the hashing function it's easy to build a hashmap of all the possible IPv4s (around 3.5B). Unless the hash uses some sort of expensive key derivation function, but that doesn't scale.
You could simply salt the hash, though you'd need to treat the salt as a secret.

Alternatively, you could use a new salt every day, which would only allow you to track an individual for a 24 hour period (likely enough for many).

?? sha256 the string and you are not going to be able to get back to the original from that output.

Edit: The small amount of IP addresses makes it easy to brute force through all of them.

The hashing doesn’t matter when IPv4 has such a limited dataset. IPv4 has a little under 4.3B addresses, and a cheaper GPU such as the 1080TI has a hash rate of around 4300MH/s, so it crushes that in a few seconds at most.

From there, you have a direct correlation between the IP and its resulting hash. Meaning you can easily see what the original input was.

You don’t need to break a hash to know what the original input was.

ipv4 space is very limited and you can easily compute all the hashes. There is salting and combined with rotating salts it could work but no one guarantees that you’re not storing them