Sat solvers have been implemented on backtracking regex engines. Sat is NP thus game over for looking for linear time solutions to arbitrary backtracking regex. Sadly.
> Sat solvers have been implemented on backtracking regex engines.
Again, if the (not-really-)regular expression grows as the SAT problem does, that doesn’t mean game over yet. Your favourite ahead-of-time regex compiler (the good interactive ones are usually incremental, but lex/flex or re2c certainly fit) is also exponential in the size of the regex, because determinization can have exponential-size output. After the (exponentially large) compiled form is produced, though, matching is linear in the size of the haystack no matter what it is.
Burntsushi’s sibling reply links to blog post and thence to a proof-of-concept regular+backreferences expression matcher[1] that works in (quite miserable but still) polynomial time wrt the haystack, with the degree of the polynomial depending on the number of groups that have backreferences to them.
That’s not exactly fantastic, mind you—recall that you can match arbitrary context-free languages in “merely” cubic time. But it’s not exponential.
Attempting polynomial with respect to the string being searched for a constant regex is interesting. A finite regex will contain a finite number of back references.
I think a finite number of back references implies a maximum stack use for the backtracking approach. DFA plus finite size stack machine is convertible to a DFA with a smaller stack and onwards to a pure DFA.
That is probably worth trying to implement, thank you for the push.
Again, if the (not-really-)regular expression grows as the SAT problem does, that doesn’t mean game over yet. Your favourite ahead-of-time regex compiler (the good interactive ones are usually incremental, but lex/flex or re2c certainly fit) is also exponential in the size of the regex, because determinization can have exponential-size output. After the (exponentially large) compiled form is produced, though, matching is linear in the size of the haystack no matter what it is.
Burntsushi’s sibling reply links to blog post and thence to a proof-of-concept regular+backreferences expression matcher[1] that works in (quite miserable but still) polynomial time wrt the haystack, with the degree of the polynomial depending on the number of groups that have backreferences to them.
That’s not exactly fantastic, mind you—recall that you can match arbitrary context-free languages in “merely” cubic time. But it’s not exponential.
[1] https://github.com/travisdowns/polyregex