|
|
|
|
|
by glangdale
822 days ago
|
|
This would be, speaking as someone who has dealt with his share of state machines, quite confusing. The uncomfortable thing about regular expression implementation as automata is either (a) non-determinism (i.e. "being in many states at once" a la Glushkov or Thompson NFAs, not "non-determinism meaning something different might happen on any given execution") or (b) state explosion (in a DFA, to represent non-determinism). This has a huge impact on trying to hook actions, call-backs, etc (your "arbitrary procedure") to NFA states as you're frequently making many overlapping entries to these states, many of which go nowhere. Trying to figure out which entries correspond to which other entries isn't easy. There are very stylized automata that are used in parsing that do interesting things beyond the finite automata space (like pushing things onto a stack), but they don't correspond to regex per se - instead they are generated from a grammar. |
|
> interesting things beyond the finite automata space
State machines certainly can be glorified into LR parsers, or even into Turing machines by adding various bells and whistles. So sure, this idea shouldn't be limited to just regular expressions--we've got good methods of specifying even very complicated state handling.
I hope you saw the excellent comment on this thread where somebody talked about how Ken Thompson implemented paxos using yacc for state management. I've been around the block with state management too, but using yacc for state management is something that never occurred to me, and frankly, opens up a whole new world of possibilities.
BTW, it's also an answer to your point about the pitfalls of nondeterminism: yacc is a great example of how to specify a state machine while detecting, reporting, and resolving nondeterminism.