| Interesting observation of computational complexity. HMMs require Viterbi algorithm to find the desirable sequence. The complexity of the algorithm depends on the number of previous decisions on which we condition the current one. Time complexity is O(n^(k+1) * S) where n is the length of the sequence, k is the number of previous decisions we condition on, and S is the number of possible decisions. Now, why does HMM require the Viterbi search procedure, and LSTM/RNN doesn't? Inference in LSTM is linear in the length of the sequence -- ignoring the classifier decision time. Why does this work without Viterbi? Given the fact that even Conditional Random Fields need dynamic programming, and that Maximum Entropy Markov Model suffers the inability to learn quickly without dynamic programming and make inference not suffer label bias. What is so special about LSTM that it doesn't need DP. RNN obviously learns a much better representation of features. Still, why then won't HMMs work with that same representation without the search part? Really baffles me. Although, there exists a process that doesn't use NN for sequence labeling and it is linear in the length of sequence and works well. Instead of using a simple classifier as is the case in MEMMs, one can use a cost-sensitive classifier and learn for each decision associated future loss and greedily avoid the loss. This allows the usage of a simple multiclass classifier, removes the search part and can work as well as NN given the right features. |