|
|
|
|
|
by osteele
3026 days ago
|
|
Good idea; a nit on terminology: NFAs and DFAs have equivalent expressive power[1]. For this application, you probably want a determinized DFA (one without epsilon transitions); that's constant time on the input string. (It's still not as fast as Boyer-Moore.) Backrefs let you write PDAs, which are more powerful than either NFAs or DFAs. PCRE2[2] implements DFA-based matching. The cited page says it's slower than the NFA algorithm (although presumably a lower complexity order); I think because of lookaheads and capturing. [1] Michael Sipser, _Introduction to the Theory of Computation_. [2] http://www.pcre.org/current/doc/html/pcre2matching.html |
|
Yep, that's the thing I was thinking of. Thanks!
Question, since you seem like you might know: is there an alternative regex library—or a plain Yacc parsing grammar that you could stick as a validator in front of PCRE's parser—that would allow through only the regexes which parse out to constant-time (determinized) DFAs?