Hacker News new | ask | show | jobs
by markosaric 1929 days ago
We're hoping to start a conversation with browsers such as Brave and Firefox and blocklist maintainers about this.

One way to incentivize even more sites to move from GA et al would be to create some kind of privacy criteria and whitelist those analytics that fulfill it (open source, minimal data, no personal data, no cookies/persistent identifiers, no cross-site/device tracking, no connection to adtech etc).

Site owners want analytics. We offer self-hosted service but most sites don't want to deal with managing analytics server as it is not an easy job. So by blocking every analytics tool (good or bad) the incentive for site owners is more on trying to avoid being blocked rather than on moving to something more privacy-friendly.

(I'm the Plausible co-founder)

5 comments

I doubt that would end up well with users (and Plausible is already blocked in blocklists). It is like the whole "VPN for privacy" debacle all over again. There's no way that a tracking company can prove to me that the tracking is not logging things it shouldn't (today or in five years), no matter how open source it is. As long as you can't prove it it isn't trustworthy, just like VPNs that have proven to be a privacy nightmare where you have lots of companies pretending to not log but in reality often do.

IMO all this will do is end up with yet more lists for adblockers and not only do we already have a huge mess with those we are also seeing them being strangled by API changes like in Chromium. Personally I'd much rather visit a site that use GA (because I know I can block it) than go in and "hope for the best" like it is with ad blocklists. Whitelists would either have to be bulletproof (IE. back to the proven privacy problem) or they would be like cookie pop-ups where most have no idea which to use and trust. I most definitely do not trust someone who builds a Chromium derivative to decide what to whitelist. Whitelists belongs in the users hands where they already are, not some remote company that is bleeding money. We have seen how that works out with a certain adblocker already.

I'm a site owner for a small business with zero tracking scripts and zero external connections from the site so I know for a fact that tracking is unnecessary even in areas with lots of online competition. Sure I could do a lot of tracking to make more money for the business but that is the rub isn't? Tracking is about greed. Webservers already tell us enough otherwise.

Edit: I'll also just add that anyone who is in the tracking business and use CNAME fiddling is per definition not trustworthy.

> Tracking is about greed. Webservers already tell us enough otherwise.

There are a few aspects that can't be tracked from server logs, for example screen size. I think this can be fairly important for UX reasons.

There's some other tracking that can be useful as well; for example if you're considering removing a button or feature then it's useful to know how many people are using that. If this is a JS-only feature (like, say, sorting a table in JS) then you need some JS tracking on this.

In short, I feel lobbing all "tracking" in one category is a mistake. It's all about how you use it and what you do with it. This applies to most technology really.

I do agree that trust is a big concern; I don't really have a clear comprehensive solution to this.

Everything you say is good and true. Unfortunately, the trust has been broken, and now everyone loses. The onus is on the people doing the “good” tracking to prove that they’re deserving. “This is why we can’t have nice things.”

Or, more realistically, the tracking will move to the browser, and since the dominant force in online advertising is also the dominant force in the browser market, they’ll continue to dig their most and track us all.

> The onus is on the people doing the “good” tracking to prove that they’re deserving.

That's exactly what your GGP is proposing ("create some kind of privacy criteria and whitelist those analytics that fulfill it")

Client Hints is a draft standard (currently supported in Chromium browsers) that allows servers to request some details like viewport-width.

https://developers.google.com/web/fundamentals/performance/o...

* I work at Google but not on Chrome

BrendanEich has some ideas on the "trust but verify" aspects of this. Plausible is 100% open source with no proprietary parts but we'd love to work with Brave (and Firefox/EasyList/uBlock Origin) to provide proof to get verified and unblocked by them. It would be a very effective way to get many more sites/businesses to remove GA
The thing is, say that you would be exempt by the blockers.

The way they work is not by downloading and checksumming scripts to see if they are allowed it not. They just downright refuse to download what is blocked.

So someone could use your special whitelist status to get their creepy tracking into visitor web browsers.

That does not make sense to allow for blockers.

Hence, you will continue to be blocked.

Great effort, though. I wish this were the future of analytics.

But being "open source" doesn't really guarantee anything. How do I know that anything I send to mysite.plausible.io gets processed the way you say it does? How do I know that the code running on Plausible.io is the same code that's on your GitHub? Hell, even if I can verify your code then how do I know your proxy doesn't syphon it off to a second "secret" service?

Don't get me wrong, I have no reason to doubt your claims and do trust you specifically, but basing the entire system on "I decide to trust Marko from Plausible" doesn't really scale.

I am in the same boat as you as I run GoatCounter; I know I do everything like I say I do, but I also know that there's nothing preventing me from doing any of the above and actually collecting much more from what I say I do. It's not hard to set up and no one will ever find out. Theoretically there are legal limits on this. In practice this is a very weak guarantee. This is a big reason why self-hosting was always a first-class supported use case for this.

Theoretically there are some technical things you can do to improve matters; for example a per-domain device ID generated by the browser (or JS, doesn't really matter actually). But then you run in to legal limits due to the way the GDPR is phrased, even though it's more privacy-friendly and not really in the spirit of what the GDPR is about :-/ We talked a bit about this over email last year IIRC.

The real crux is finding something that's practical, usable, and will actually be implemented/used. We can all think of some idealized system, but if it's not realistic that it'll be implemented then it's a pretty academic exercise. In practice this means that any browser solution will need buy-in from at least the Chrome and Safari teams to really be useful, and I don't rate the chances of that as very high of happening any time soon.

This isn't even because I subscribe to some "Big Evil Google and Their Nefarious Dark Plans" view, but just because they have little incentive to do any of this and it's quite a lot of work to do it well. It's easier to just block the lot and, arguably, this is perhaps better than doing nothing. If GoatCounter is impacted by this then so be it. At the end of the day site owners are not the customers of Safari and Chrome: people using those browsers are.

Someone from Chrome has proposed something similar in the form of a "privacy budget". Each fingerprintable surface gets a score and each origin has a budget. Once you go over, something(?) happens.

https://github.com/bslassey/privacy-budget

The fact that it comes from the Chrome team is why you know it should be discarded. The Chrome team's sole job is to protect Google's targeted advertising business.
I think that’s a great idea and I’m firmly on your side.

However, we’re still on that slippery slope as described I think.

At some point Firefox, Chrome and Safari is going to start blocking almost everything by default - or at best severely restrict them.

The question is, how can we move to some kind of embedded analytics? It’s already kind of there in most of the larger platforms.

>whitelist those analytics that fulfill it (open source, minimal data, no personal data, no cookies/persistent identifiers, no cross-site/device tracking, no connection to adtech etc)

Unpopular Opinion on HN.

This is hard. And sort of force everyone into the same bracket of privacy. As a contrarian, I actually want to know returning visitors ( It doesn't even need to be 100% accurate ). Right now, Tech uses privacy as a word for anonymous. In a real world analogy, the current privacy definition means I would be intruding my customer's privacy if I recognise the same customer coming into my coffee shop everyday during roughly the same time ordering the same latte with Oat milk.

I dont need to know who they are, and I shouldn't be able to buy what ever set of Data to match their profile or be able to sell my data for others to match him/her. Which is what I think is wrong with current tracking and adtech. But knowing my customers should not be it.

The thing that stops many people moving away from GA is Adwords. If I want to advertise via AdWords do I have any choice but to use GA?
You can advertise using Adwords without having GA.
You can. Poorly.