Hacker News new | ask | show | jobs
by saagarjha 2678 days ago
Safari pre-compiles the blocking list into some internal representation that is faster than a raw regular expression, and the matching operation is performed many times per page. So I do think it's possible to notice a small performance benefit?
1 comments

To filter URLs, you have to parse them, check if the domain is blocked using a hash table and then search for thousands of substrings in the path and query parts of the URL. If you use a regex for that, most of the filtering will already run in native code. I guarantee you that this gives you a tough to beat baseline with almost no room for improvements.

Looking at the WebKit implementation, the authors are shipping their own regex engine for whatever reason. I doubt that it beats the battle-tested re2 engine by large margins, if at all.

I don’t think the content blocker API allows for the full set of features you’d find in standard regex library. This might mean that WebKit can roll their own regex library that’s better optimized for this subset?