Hacker News new | ask | show | jobs
by StefanoC 2670 days ago
That's interesting, and good to know! I wonder if the heuristic can be bypassed by changing the code (e.g. adding a semicolon) or changing the URL further.
2 comments

Also you asked in your article whether this can be used for ads.

Yes, there is an Israeli company offering to publishers to configure nginx as a reverse-proxy ( https://vip.wordpress.com/plugins/yavli/ ) and they serve the ads as small chunks of images (to not match the usual 300x250 or 468x60).

It made Easylist quite angry at the time: https://easylist.to/2015/08/19/issues-with-yavli-advertising...

Of course it can be bypassed and it's not very difficult. It's just that the way of filtering is different (many browsers / extensions are just Easylist/Disconnect clones)

To go further on the proxy idea, I think that the best strategy could be to actually do server-side calls to GA: https://ga-dev-tools.appspot.com/hit-builder/ (yes there is an API for server-side hits).

The minus of the proxy idea, is that since you don't have access to *.doubleclick.net (which should be blacklisted by any decent track/adblocker) you don't get demographics info back into GA.

But after all, like other comments said, aren't you simply a first party tracker ? GA is just a more evolved storage point than, let's say using goaccess on raw logs.

> To go further on the proxy idea, I think that the best strategy could be to actually do server-side calls to GA: https://ga-dev-tools.appspot.com/hit-builder/ (yes there is an API for server-side hits).

Yes, probably big players would like to use server side analytics! But that's a bit too involved for small websites.

> The minus of the proxy idea, is that since you don't have access to *.doubleclick.net (which should be blacklisted by any decent track/adblocker) you don't get demographics info back into GA.

When I pull down Google Analytics I also change its content to make it point to the reverse proxy itself. I didn't find any call to that domain being blocked, so I didn't do it for that particular case.

I think that the data collections is done via https://www.google-analytics.com/r/collect, which I do proxy. Notice however that sometimes an easy list filter kicks in and blocks that just because it happens to match "r/collect". I think there is a race condition somewhere that makes it not work sometimes, because I couldn't replicate it consistently. Anyways, it would be as simple as changing that domain specifically to something else. I tried doing so, but Netlify's redirects where playing up (possibly because I'm on the free tier) so I gave up. The concept of masking the domain/url still applies.