|
|
|
|
|
by slimsag
1926 days ago
|
|
Thanks for the super detailed response again, Andrew! I have a ton to learn from you about how production grade regex engines work honestly :) Do you have any tips for learning as much as you have aside from mulling over papers and existing implementations? Also, by "do not repeat the same mistake that the OP seems to make: conflating what DFA and NFA mean." do you mean in the article? I wrote that mostly with the intent to say _"this ain't how things are normally done, DFA/NFA exist and are used"_ but admittedly don't have more than a surface-level understanding of them. If you have tips on how to clarify that wording so I don't confuse others, at the very least, I'd much appreciate it! :) |
|
Basically, there's a long tradition of calling the latter "NFA engines" and the former "DFA engines." But doing so is inaccurate and confusing. Especially since "DFA engines" also include things called "NFA engines" that are different from the former "NFA engines" as used when describing backtracking engines.
If that doesn't make sense, then just remember this: DFAs and NFAs have equivalent expressive power. So if you're describing a regex engine that gives you thinks like backreferences, recursion and all sorts of other bells and whistles, then it can't be an NFA. It is almost certainly implemented with backtracking.
> Do you have any tips for learning as much as you have aside from mulling over papers and existing implementations?
I started with Russ Cox's article series and read the RE2 source code. Then I built my own.
I'd also check out anything related to Hyperscan. It's a deep rabbit hole to tumble down though.