|
|
|
|
|
by burntsushi
2645 days ago
|
|
Thompson's NFA construction, when used directly to search via an NFA simulation, is dog slow. It does give you a O(nm) search, but in practice it's slow. AIUI, GNU grep uses a lazy DFA, which uses Thompson's NFA construction to build a DFA at search time. This does indeed lead to pretty good performance for the regex engine. But GNU grep's speed largely comes from optimizing the common case: extracting literals from your pattern, identifying candidate matching lines, and then checking with the full regex engine to confirm (or reject) the match. |
|
That being said, matching literals is always going to be faster, especially if you decompose the pattern to get more use out of your literal matcher - the downside of filtration is that if the literal is always present, you are just doing strictly more work. At least with decomposition you've taken the literal out of the picture. See https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-mul... for those who don't know what I'm talking about (I know you've read it).
Am flirting with doing another regex engine that gets some of the benefit of decomposition and literal matching without taking on the nosebleed complexity of Hyperscan...