|
|
|
|
|
by glangdale
2645 days ago
|
|
I suspect Thompson's NFA is not inherently dog slow (Glushkov can be done reasonably fast for decent-sized NFAs). The fact is that most Thompson-lineage engines opted for the 'lazy DFA' approach and optimized that (which is effective until it isn't). I imagine a more aggressive 'native' Thompson NFA is possible. A nice benefit of that is not having to write to your bytecode - there's a good deal of systems-level complexity stuff in RE2 that springs out of a consequence of the 'lazy DFA construction' decision. That being said, matching literals is always going to be faster, especially if you decompose the pattern to get more use out of your literal matcher - the downside of filtration is that if the literal is always present, you are just doing strictly more work. At least with decomposition you've taken the literal out of the picture. See https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-mul... for those who don't know what I'm talking about (I know you've read it). Am flirting with doing another regex engine that gets some of the benefit of decomposition and literal matching without taking on the nosebleed complexity of Hyperscan... |
|
Is there a fast glushkov implementation that isn't bit parallel? I've never been able to figure out how to use bit parallel approaches with large Unicode classes. Just using a single Unicode aware \w puts it into the weeds pretty quickly. That's where the lazy DFA shines, because it doesn't need to build the full DFA for \w (which is quite large, even after the standard DFA compression tricks).