Hacker News new | ask | show | jobs
by nickbail3y 3355 days ago
According to the article, current adblockers like Ublock Origin aren't searching for keywords in the page elements for ads, they're just using human maintained lookup lists?

That's very surprising. I would have thought that would be the first avenue of attack for adblockers.

Also, who wants to help me retrofit this thing to actually block ads?

4 comments

> According to the article, current adblockers like Ublock Origin aren't searching for keywords in the page elements for ads

That's not quite true, uBlock Origin has the ability to block Procedural cosmetic filters [1] first introduced in version 1.8.0 [2].

These filters don't stop the network request from loading, but they do have the ability to look for target words. e.g. facebook inline feed ads can be cosmetically removed with the following filter [3]:

    'facebook.com###stream_pagelet div[id^="hyperfeed_story_id_"]:if(span.uiStreamAdditionalLogging:has-text(Sponsored))'
[1] https://github.com/gorhill/uBlock/wiki/Procedural-cosmetic-f...

[2] https://github.com/gorhill/uBlock/releases/tag/1.8.0

[3] https://github.com/uBlockOrigin/uAssets/issues/233#issuecomm...

> they're just using human maintained lookup lists?

That's correct. The lists are maintained here: https://forums.lanik.us

I'm pretty sure at least some of the popular adblockers out there block based on basic keywords as well. We had some users complain that an adblocker was blocking valid parts of our web app because some of the page elements had IDs that started with '#ad-' since 'A.D.' was an internal abbreviation for one of our features.
Very interesting, thank you.
Your question has been answered already, I'd just like to point out there is a second "advanced" mode in uBlock Origin: Basically, you disable loading of everything by default and a click on the icon shows you what servers the page tried to load stuff from. Then you can enable them on a case by case basis, temporarily or add your selection as a new permanent rule.

This is independent of and in addition to the lists, they are always used. It gives you much greater control. What I do is first load the page with everything disabled, then I enable piece by piece until I get enough of the page working. Most of the time it's obvious, for example when there is a big white space on the page where you expect a video to be and you see that youtube.com was blocked you know they tried to show an embedded Youtube video. If you want to see it you enable that domain and the page refreshes. Etc.

I used the default mode of just the lists for most of my life, but now I can't imagine my life without the advanced mode. (As a Chrome user) I don't miss NoScript any more (yes I know that does even more).

Blocking a URL or a CSS class name can be done very efficiently/easily with a browser extension, while running OCR on an ad (most ads are images/videos rather than pure text) to find keywords is much more difficult and computationally expensive. Apparently Princeton made it work, but it's definitely a much more difficult approach to take.
Finally a reason to upgrade my processor.