I spend most of my time in dynamic languages so I have to ask... is this that much faster than a good regex library that it warrants a hand rolled state machine? How normal is something like this in a typical C/C++ codebase?
A "good" regex library will dynamically generate code; and if the regex is simple enough, generate code that implements a DFA rather than NFA (or PDA, for Perl regexes). So for a "good" library, no, it won't be much faster.
But very few regex libraries are that "good", because it's a combination of extreme speed with very low flexibility (DFA has exponential state explosion in worst cases compared to equivalent NFA, and isn't able to deal with e.g. backreferences). The vast majority of good regex libraries will be an order of magnitude slower. Average regex libraries included in most language distributions will be slower again.
The chief exception is compiler lexer generators like lex and flex. They produce code very similar to the state machine linked. And that's probably the most common place to see this kind of thing.
Encoding the state of the machine implicitly as the program counter, rather than an explicit state variable, often results in more readable code and is the more usual way to do it when writing it by hand. It also saves a register, important on some architectures. But the technique has slightly more limited expressiveness owing to needing to stick with structured programming constructs.
People (should) only implement things in C/C++ after thoroughly profiling the dynamic language POC and being sure they need the performance that a C implementation affords. So yes, the C implementation is going to sacrifice readability for performance. Of course, there are better and worse ways of accomplishing the same thing.
But very few regex libraries are that "good", because it's a combination of extreme speed with very low flexibility (DFA has exponential state explosion in worst cases compared to equivalent NFA, and isn't able to deal with e.g. backreferences). The vast majority of good regex libraries will be an order of magnitude slower. Average regex libraries included in most language distributions will be slower again.
The chief exception is compiler lexer generators like lex and flex. They produce code very similar to the state machine linked. And that's probably the most common place to see this kind of thing.
Encoding the state of the machine implicitly as the program counter, rather than an explicit state variable, often results in more readable code and is the more usual way to do it when writing it by hand. It also saves a register, important on some architectures. But the technique has slightly more limited expressiveness owing to needing to stick with structured programming constructs.