Hacker News new | ask | show | jobs
by zuzun 2682 days ago
To filter URLs, you have to parse them, check if the domain is blocked using a hash table and then search for thousands of substrings in the path and query parts of the URL. If you use a regex for that, most of the filtering will already run in native code. I guarantee you that this gives you a tough to beat baseline with almost no room for improvements.

Looking at the WebKit implementation, the authors are shipping their own regex engine for whatever reason. I doubt that it beats the battle-tested re2 engine by large margins, if at all.

1 comments

I don’t think the content blocker API allows for the full set of features you’d find in standard regex library. This might mean that WebKit can roll their own regex library that’s better optimized for this subset?