Hacker News new | ask | show | jobs
by jsheard 706 days ago
Obligatory GoatCounter plug: https://www.goatcounter.com

It's also cookieless, the hosted version is free to use within reason, and it's extremely lightweight if you choose to self-host it. It doesn't even need a separate database, it can run self-contained with SQLite (or Postgres if you prefer). A good fit for small sites where the big industrial-grade solutions are overkill.

3 comments

This service claims to not track personal data, yet their docs admit to storing hash(siteID + User-Agent + IP) + seen_paths on their backend for session tracking.[1]

Sites can track sessions without tracking personal data.

1. https://www.goatcounter.com/help/sessions

right below that the docs also say that this hash is not persisted, only cached in memory and mapped to a UUIDv4. The UUIDv4 is what persists between sessions.

> The IP address and User-Agent are never stored to the database or disk, and there is no conceivable way to trace the random UUID back to this. > > It’s only stored in memory, which is needed anyway for basic networking to work.

I can't say whether that is GPDR compliant but it's definitely not storing the hash

> Sites can track sessions without tracking personal data.

Could you detail how that would work?

Fetch an empty resource that is privately cacheable, set to max-age=0, and has an ETag containing the current timestamp and a random session id. The browser will consider its cached copy always stale.

When you next fetch that resource, because it is stale, the browser will revalidate it by passing an If-None-Match header containing the ETag. Update the ETag to include the original timestamp and the current timestamp.

So on every page load (or whichever other event you want to measure), you will be told when that session started, the session id and when that visitor was last seen.

To set the maximum session duration, reset the ETag if the last seen timestamp passed to you in If-None-Match is too long ago.

This can even work without JavaScript by using an img element.

The only data tracked with this is the session start time, last seen time, and a random session id. Since the session id isn’t related to any of your business logic, it cannot be used to identify an individual.

To further isolate this data, locate the tracking resource on a different hostname. The browser’s SOP will prevent any cookies from being sent with the request, so your analytics backend can’t record identifying information even if it wanted to. This will also prevent you from tracking which page is being visited, though you can override that with the no-referrer-when-downgrade referrer policy.

That's just a cookie. And then you're back to the annoying consent banners.
You just reinvented analytics cookies. You’d be surprised, but they don’t store PII either. It’s usually just a randomized session ID and timestamps, like you’re suggesting.
Why do all this when you can set a cookie with a random session ID?
In browsers, it's as simple as:

    if (!sessionStorage.sessionReported) {
      reportSession();
      sessionStorage.sessionReported = 1;
    }
„ In comparison, in the context of the European GDPR, the Article 29 Working Party[6] considered hashing to be a technique for pseudonymization that “reduces the linkability of a dataset with the original identity of a data subject” and thus “is a useful security measure,” but is “not a method of anonymisation.”[7] In other words, from the perspective of the Article 29 Working Party, while hashing might be a useful security technique, it is not sufficient to convert personal data into deidentified data.“

https://www.gtlaw-dataprivacydish.com/2021/03/what-is-hashin...

I am a DPO. The claims Plausible makes won't hold up to scrutiny.

It's a simple trick: declaring all data collected to technical data, when in fact it is linkable to a data subject.

Thus collection of the data requires consent, because a subject is identified at least for the session.

If you can identify unique visitors you are clearly identifying individuals.

Indeed you are correct. Plausible it is not. They should put their cookie consent back up, and need to inform their users how they are indeed processing the data collected from personal users.

  hash(daily_salt + website_domain + ip_address + user_agent)
That's what they do. Within 24 hours the daily salt is gone, and the data is anonymous.

https://plausible.io/data-policy#how-we-count-unique-users-w...

problem is that this is what they say they do, there are too many examples of companies being noncompliant to their own policies and regulations. they should explain the abovementioned algorithm in their data privacy declaration published online. also even a hash can be considered as a private and personal data unless it has been protected sufficiently. thus need to inform your users anyway.
Good approach. IP Addresses are personal data. So the data and the hash is subject to GDPR.

You still need consent to collect it - well or some other kind of legal shenanigans. The intent is to track a person, it is not technically necessary. You might have a legitimate interest - but in the end you still have to consider the GDPR to use this tool.

https://europa.eu/youreurope/business/dealing-with-customers...

Turns out that many officials believe this is fine. Companies using Plausible, Matomo and similar services have been under scrutiny.

IP adress is required for site to function - your server cant not collect it. Plausible also only processes it for uniqueness and doesnt save it as is. Interestingly most webservers/firewalls will have to keep track of ip adresses so they will be saved in acess logs and caches. Making them more problematic than Plausible. Yet its most likely fine because the intent is not to track individual users but to improve service/keep it runing. Plausible intent is also not track individual users but collect visitor counts which is something used for improving service too.

I think you might be prematurely spreading fear.

That's a bit simplistic. IP addresses are not unequivocally personal data. Let's rewind back a bit, GDPR Art. 4:

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

IP addresses only allow to identify a natural person when combined with other data, such as ISP data or a profile built over dozens of websites. This is not the same kind of personal data as a name + address, Breyer notwithstanding (note the bit about the ISP in the judgment).

GDPR is not about identifying an abstract entity, it's about identifying a natural person. Doing the former for long enough/with enough data allows the latter, but especially with time-limited in-memory hashes that's a non-existent window of opportunity.

In practice this'd probably need to be resolved in court, and I'm sure not a single SME using Plausible or similar will even get a stern letter, much less fined.

What’s your thought on the approach adjust.com takes? They say you can claim legitimate interest
what are your thought on aggregated data? you can still identify unique visitors but its aggregated data so you can't link it back to the individual.

I have doubts that just identifying unique visitors would also identify individuals. Their current approach of creating random id which is unique for 24 hours should not violate GDPR? or it would?

You begin at a point where you have data to aggregate. This data is linked to individuals.

Anonymisation of data is data processing and some argue, that it is subject to a privacy impact assessment. Arguing that if done poorly it has great negative consequences for the individual if they can be deanonymized.

The duration itself does not change the outcome.

Thus said the approach Plausible takes is much better than any cookie used.

I think you can argue if this holds up: you cannot retrieve the ip from the hash (and residential IPs are usually dynamic). The short lifetime together with never storing the hash makes it so you cannot de-anonymise the user.

No one will get fined for not asking consent for this. Our DPO just said ‘don’t be silly’ when I asked him. But we will see if it gets tested (my bet: it won’t).

I like Umami: https://umami.is/
Currently using Umami, but I've considered switching to Plausible due to Umami's less-than-stellar development performance (e.g. breaking the site details page for a few days recently).
I switched to Umami for now because the Plausible developers were totally disinterested in fixing bugs that silently dropped data.

These are still JavaScript solutions, so if their JS code is broked then you just don't get the data. You end up with unknown unknowns.

The only truly reliable data you can get is from your server logs, and obviously you are limited by whatever the browser gives you in the request.

Check out Usermaven.com.
Also happily using hosted GoatCounter. Last year I noticed some occasional operational hiccups, like service brief downtime, but this year it's been completely stable as far as I can tell.