| > O(length of string) per step The big win here is not that it's as fast as the optimal algorithm (it's not), but that it's nearly as fast and vastly simpler. You're technically right that the given code is O(n^2), but the optimization discussed later in the article can easily be applied to turn that into O(nk). Further, for the lucene problem, k=2. That's small and effectively constant, making it "basically O(n)". That's still theoretically worse that optimal, but in practice it just doesn't matter. Even with the slightly slower algorithm, Lucene's cost of searching should be dominated by repeatedly _using_ the DFA and not by _building_ it. What's really important is that the O(nk) construction algorithm is much simpler than the optimal version. I don't know what the Lucene guys were originally trying to do, but if they had figured out this method instead, they could have avoided the super complicated implementation of the true O(n) algorithm. > you can also implement the dynamic programming approach for a pair of strings in O(nk) time Yes, but that's for every pair of strings you want to test. A DFA costs exactly as much to build as solving a single problem with DP, but when a large number of pairs all have one string in common (the query string), a pre-built DFA can be used repeatedly in only O(n+k) each time. Of course, whether that's enough to make a practical difference depends on the problem at hand. |