|
|
|
|
|
by clickok
3560 days ago
|
|
According to a talk by Max Tegmark[0] (and its associated paper[1]), neural nets (particularly LSTMs) might be inherently better at this sort of thing due to the way they model mutual information. Markov models are best suited to situations where an observation k-steps in the past gives exponentially less information about the present[2] (decaying according to something like λ^k for 0 <= λ < 1).
Intuitively, the amount of context imparted by a word or phrase decays somewhat more slowly.
That is, if I know the previous five words, I can make a good prediction about the next one, and likely the next one, and slightly less likely the one after that, whereas in a Markovian setting my confidence in my predictions should decay much more quickly. So in answer to the grandparent, such a thing should be reasonably straightforward to build if it doesn't exist already, and it may offer improvements over a similar model based on Markov chains. --- 0. https://www.youtube.com/watch?v=5MdSE-N0bxs 1. https://arxiv.org/abs/1606.06737 2. Why is this? Lin & Tegmark offer details in the paper, but it comes from the fact that the singular values of the transition matrix are all less than or equal to one (an aperiodic & ergodic transition matrix has only one singular value equal to one), and so the other singular vectors fall away exponentially quickly, with the exponent's base being their corresponding singular value. |
|