Yes! In 1948, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text and this model can be used to generate random text given some existing text. (http://www.cs.princeton.edu/courses/archive/spr05/cos126/ass...). Here is a GitHub implementation: https://github.com/jsvine/markovify Deep models like LSTM/RNN can probably produce better results.
According to a talk by Max Tegmark[0] (and its associated paper[1]), neural nets (particularly LSTMs) might be inherently better at this sort of thing due to the way they model mutual information.
Markov models are best suited to situations where an observation k-steps in the past gives exponentially less information about the present[2] (decaying according to something like λ^k for 0 <= λ < 1).
Intuitively, the amount of context imparted by a word or phrase decays somewhat more slowly.
That is, if I know the previous five words, I can make a good prediction about the next one, and likely the next one, and slightly less likely the one after that, whereas in a Markovian setting my confidence in my predictions should decay much more quickly.
So in answer to the grandparent, such a thing should be reasonably straightforward to build if it doesn't exist already, and it may offer improvements over a similar model based on Markov chains.
2. Why is this? Lin & Tegmark offer details in the paper, but it comes from the fact that the singular values of the transition matrix are all less than or equal to one (an aperiodic & ergodic transition matrix has only one singular value equal to one), and so the other singular vectors fall away exponentially quickly, with the exponent's base being their corresponding singular value.
It sounds like Tegmark is pointing out a pretty obvious and deliberately designed property of LSTMs... the entire point of them is to avoid exponentially decaying / exploding gradients and allow propagation of information over longer time-scales.