| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MaxBarraclough 2691 days ago

> Once compiled to a DFA, the pattern only has negligible influence on the matching time.

With the right optimisations, certain longer patterns can be much faster to scan for, on account of the Boyer-Moore approach.

If I asked you to search through a book for 10 consecutive pages of the letter 'Q', you wouldn't need to check every page. The same optimisation can be applied to regex. (Not that most regex implementations bother to do it, though.)

1 comments

Cu3PO42 2691 days ago

You're right. What I meant to say is that for any (formal) regular expression, we can compile it to a DFA and then match any strings in time linear in the length of the string.

Certainly the pattern length matters both for pre-processing (compiling to DFA) and the runtime in a Boyer-Moore approach. However, as you mentioned in the Boyer-Moore average case of Θ(n/m) a longer pattern is faster, rather than slower as the page on Nevod seems to imply.

link