Hacker News new | ask | show | jobs
by derefr 3024 days ago
> a regex that performs very poorly

Would that still be true if you limited the regex support to DFAs instead of NFAs? I can’t think of a single use-case for backreferences in routing patterns. :)

1 comments

Good idea; a nit on terminology:

NFAs and DFAs have equivalent expressive power[1]. For this application, you probably want a determinized DFA (one without epsilon transitions); that's constant time on the input string. (It's still not as fast as Boyer-Moore.)

Backrefs let you write PDAs, which are more powerful than either NFAs or DFAs.

PCRE2[2] implements DFA-based matching. The cited page says it's slower than the NFA algorithm (although presumably a lower complexity order); I think because of lookaheads and capturing.

[1] Michael Sipser, _Introduction to the Theory of Computation_.

[2] http://www.pcre.org/current/doc/html/pcre2matching.html

> determinized DFA

Yep, that's the thing I was thinking of. Thanks!

Question, since you seem like you might know: is there an alternative regex library—or a plain Yacc parsing grammar that you could stick as a validator in front of PCRE's parser—that would allow through only the regexes which parse out to constant-time (determinized) DFAs?

Not an expert in this, but I'm thinking this is usually handled by switching to a non-backtracking regexp engine. Otherwise, any such validator would depend on details of the regexp engine's implementation.