Hacker News new | ask | show | jobs
by anxrn 1576 days ago
Wouldn't it be possible for a potential client-side blocker for this to intercept the gtag() method invoked on the client side ("Tag Manager web container"), even if that function is provided by a script hosted on the website owner's domain, as Google recommends[1]?

[1] https://developers.google.com/tag-platform/tag-manager/serve...

1 comments

Highly doubtful the method would continue to be called "gtag"; any js bundling / minification would replace that with a randomly generated string, and it's just as easy to randomize the server-side api endpoint url, making this virtually impossible to block (maybe a pattern analysis on the data being transmitted, but that can also be encrypted with random algorithms and keys, beyond recognition).
Yes, it can surely be obfuscated, but ultimately there will be a client-side function with near-identical functionality prevalent all over the web. It's harder, but seems possible to build an extension to identify this function.
This is literally the same game virus scanners played against mutation engines. Ultimately, the halting problem won.

There are two places this can end:

* Redesign the runtime environment so it doesn’t matter if you download trackers. The execution environment doesn’t offer the I/O facilities that it requires to actually produce harm. This is what Apple Private Relay and Tor Browser try to give you. By analogy, this is why Web Apps became so popular in the first place — web publishers who do not intentionally collude are protected from each other by the SOP, so opening a web page should be less risky than running an EXE. It’s “just”[1] extending the existing sandbox to prevent differing origins from being able to collude.

* Instead of blocking bad scripts, allow only known-good ones. To match the convenience of current-day ad blocking, it needs to be a collaboratively-produced list. In other words, a gatekeeper. By analogy, this is why installing “unrecognized” applications on Windows and macOS is behind a scare screen, and why doing it on iOS is prevented entirely.

The former seems less dystopian, but much more difficult.

[1]: this is actually very difficult

I was going to suggest introducing the kind of heuristic analysis found in antivirus engines. Kind of like your item #2 - don’t run scripts that behave badly (for some heuristically recognizable “bad behavior”.) Basically a browser built-in AV scanner. Maybe give a user the option to permit the script once per session, or forever. Something like this would definitely introduce a UX speed bump, it sounds terrible.
You can use CTPH algorithms to fingerprint the function, so you'd need an extension that fingerprints each function before the browser runs it. Or you could man-in-the-middle yourself and patch the malicious code before it gets to your browser.

Better still would be to fingerprint the syntax tree, so obfuscators need to change more than just the names of things (Unison does this, Javascript would probably be less friendly).

I'd love an app where I could crowd-fund the inevitable game of cat/mouse that would ensue. Like maybe I put $5 in at the beginning of each month and as I browse I curate a list of sites that I'd like tampered with. Better developers than I could then publish patches for the malicious functions, which are applied as I browse. At the end of the month, my $5 gets distributed to the people who fixed the parts of the web that I browsed that month.

I'm working on a tool that facilitates collaboration on CTPH-identified blobs of data, but it's more of a `curl shadysite.com | mytool` kind of thing. I'm not sure what would go into integrating it into a browser.

Taken to its logical conclusion, this process reminds me of anti-virus software: finding code signatures and flagging sketchy code.
Exactly. And the end result might be as bad as antivirus: horrendously slow software with a huge database of heuristics that cause false positives and at the same time let malware through. It's going to suck.
it will be called differently indeed, it's already there: https://www.simoahava.com/analytics/custom-gtm-loader-server...
ML is already applied to spam mail, maybe it could be applied to JS runtime behavior to detect this kind of tracking. Fight ML analytics with ML
There's an asymmetry nat play here though. You're now burning battery to block stuff.