|
One of the moments where I really started to feel like I was starting to 'see the matrix' was when I was working on a regex engine to try to make my compiler faster (it didn't, but that's another story). The asymptotically fast way to approach regex processing actually involves writing a parser to process the regex, so in order to write a fast compiler, you need to write another fast compiler to process the regexes that will process the actual programs that you write. But, if your regexes get complex, you should really write a parser to parse the regexes that will parse the actual program. This is where you realize that it's parsers all the way down. When you think more about regexes this way, you realize that a regex is just a tiny description of a virtual machine (or emulator) that can process the simplest of instructions (check for 'a', accept '0-9', etc.). Each step in the regex is just a piece of bytecode that can execute, and if you turn a regex on its side you can visualize it as just a simple assembly program. |
Interestingly, regexes/FSMs are (IIRC) the most powerful class of machines for which equivalence is decidable. So if you give me any two regexes, I can tell you if they match on all the same strings, but this is not true for any more powerful grammar.