|
|
|
|
|
by burntsushi
2646 days ago
|
|
Do you know of any fast Thompson NFA simulation implementation? I don't think I've seen one outside of a JIT. Is there a fast glushkov implementation that isn't bit parallel? I've never been able to figure out how to use bit parallel approaches with large Unicode classes. Just using a single Unicode aware \w puts it into the weeds pretty quickly. That's where the lazy DFA shines, because it doesn't need to build the full DFA for \w (which is quite large, even after the standard DFA compression tricks). |
|
I've always thought that a better job of doing NFAs (Gluskov or otherwise) and staying bit-parallel would be done with having character reachability on codepoints, not bytes, generally remapping down to 'which codepoints make an actual difference'. This sounds ugly/terrifying, but the nice thing is that remapping a long stream of codepoints could be done in parallel (as it's not hard to find boundaries) and with SIMD. Step by step NFA or DFA work is more ponderous as every state depends on previous states.