Hacker News new | ask | show | jobs
by tuna-piano 2016 days ago
Looked for a few minutes and couldn't find the full answer. How does Plausible calculate unique users if it can't store some type of identifier on the page?

I see this... "We do not generate any persistent identifiers either. We generate a random string of letters and numbers that is used to calculate unique visitors on a website and we reset this string once per day."

But where is that ID stored?

4 comments

Probably like we do it for pirsch.io, by calculating a hashed fingerprint and throwing away the individual page hits once per day: https://github.com/pirsch-analytics/pirsch
What's the privacy benefit over storing a tracking cookie with expiry of a day? If at all, random cookie seems better for privacy as in your case if someone really wants it, they can recover the IP if the user agent is not rare by searching for all IP(4 billion IPv4), User-Agent(100 for popular browsers), the date(1 day as date is stored separately), and a salt(known to server), easily within reach of anyone.
It doesn't use cookies. Fingerprints are calculated on each page hit.

The salt must be treated like a password to make sure it's not that easy to brute force it and no one should get access to your database of course ;) It's not the strongest anonymization, but good enough considering that the hits will be deleted once a day by batch processing.

Seems like a good method and actually more accurate than they do... seems like they just do a hash of IP.
Hmm I think I've read something about it elsewhere and they also use more parameters than just the IP. Not sure.
> How can Plausible Analytics count unique visitors without cookies?

> So if you don’t use cookies how do you count the number of website visitors and report on metrics such as the number of unique users?

> Instead of tagging users with cookies, we count the number of unique IP addresses that accessed your website. Counting IP addresses is an old-school method that was used before the modern age of JavaScript snippets and tracking cookies.

> Since IP addresses are considered personal data under GDPR, we anonymize them using a one-way cryptographic hash function. This generates a random string of letters and numbers that is used to calculate unique visitor numbers for the day. Old salts are deleted to avoid the possibility of linking visitor information from one day to the next. We never store IP addresses in our database or logs.

...

> In our testing, using IP addresses to count visitors is remarkably accurate when compared to using a cookie. Total unique visitor counts were within 10% error range with IP-based counting usually showing lower numbers.

From here: https://plausible.io/blog/google-analytics-cookies#can-you-g...

A one way hash of an IPv4 address is no more private than the address itself. If you know the has algorithm, you can build a rainbow table of all the hashes in under a second. Even with a random salt it doesn't take long to build a rainbow table with all possible salts.
Doesn't that depend on the size of the salt?
To an extent, but there are easy ways to cut the search space. For example, you could make a unique request with garbage on it from a known IP every day, and then all you have to do is build a rainbow table for that one IP to find out what the salt is for each day, and then you can fully reconstruct the logs.
If the salt is a random 64bit number (for example) then "finding out" the salt is not trivial.
And unless I'm missing something, it seems easy to add plenty of bits to the salt until it's no longer practical to reverse.
@mattlondon: The salt is known to plausible, that is the only way someone can hash it.
This would be woefully inaccurate for websites with a large amount of mobile traffic (because of CGNAT), or university traffic, or etc.
Don't universities have a huge number of IPs because they were the first to use internet ?

Mine gives one public ipv4 per device that access the internet on the network (with some exceptions). Strategies varies but if you have a lot of addresses why not use them.

That might be true for some US universities, but it's definitely not true for the rest of the world.
According to Google, IPv6 traffic is up to 30% these days.
you can see the exact method on our data policy: https://plausible.io/data-policy
I’m guessing a cookie with an expiration of 24 hours, but I could be wrong