Hacker News new | ask | show | jobs
by burntsushi 761 days ago
The GP doesn't really provide enough details to tell the difference between 1) the programmer did something wrong, or 2) the regex engine is missing optimizations or 3) the GP has a use case that is actually difficult to optimize for in the generality of the regex engine. Any one of these 3 things could be true. It's possible more than one is true simultaneously!

> a linear find should be equivalent to a well implemented regex search for a specific string of characters

This is only true theoretically. If your regex engine is just a DFA scan that does char-at-a-time matching, then any "well implemented" substring search algorithm is going to blow it out of the water because it likely uses SIMD. Of course, many regex engines (including RE2), detect literals in the regex and use SIMD to scan for it. So while "DFA scan that does char-a-time matching" is a fine semantic model to have (to a first approximation) for a finite automata based regex engine, the model breaks down pretty quickly for reasoning about performance in general.