| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by junyanz 3560 days ago
	Yes! In 1948, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text and this model can be used to generate random text given some existing text. (http://www.cs.princeton.edu/courses/archive/spr05/cos126/ass...). Here is a GitHub implementation: https://github.com/jsvine/markovify Deep models like LSTM/RNN can probably produce better results.

4 comments

liuhenry 3560 days ago

Some fun examples of text generation using LSTM/RNN (and a good overview of RNNs for sequences): http://karpathy.github.io/2015/05/21/rnn-effectiveness/#fun-...

link

clickok 3560 days ago

According to a talk by Max Tegmark[0] (and its associated paper[1]), neural nets (particularly LSTMs) might be inherently better at this sort of thing due to the way they model mutual information.

Markov models are best suited to situations where an observation k-steps in the past gives exponentially less information about the present[2] (decaying according to something like λ^k for 0 <= λ < 1). Intuitively, the amount of context imparted by a word or phrase decays somewhat more slowly. That is, if I know the previous five words, I can make a good prediction about the next one, and likely the next one, and slightly less likely the one after that, whereas in a Markovian setting my confidence in my predictions should decay much more quickly.

So in answer to the grandparent, such a thing should be reasonably straightforward to build if it doesn't exist already, and it may offer improvements over a similar model based on Markov chains.

---

0. https://www.youtube.com/watch?v=5MdSE-N0bxs

1. https://arxiv.org/abs/1606.06737

2. Why is this? Lin & Tegmark offer details in the paper, but it comes from the fact that the singular values of the transition matrix are all less than or equal to one (an aperiodic & ergodic transition matrix has only one singular value equal to one), and so the other singular vectors fall away exponentially quickly, with the exponent's base being their corresponding singular value.

link

tfgg 3560 days ago

It sounds like Tegmark is pointing out a pretty obvious and deliberately designed property of LSTMs... the entire point of them is to avoid exponentially decaying / exploding gradients and allow propagation of information over longer time-scales.

link

drewwwwww 3560 days ago

Check out this rather entertaining talk from GitHub universe about the use of an LSTM to generate a film script: https://www.youtube.com/watch?v=W0bVyxi38Bc

and the short film they made, using that script: https://www.youtube.com/watch?v=LY7x2Ihqjmc

(disclosure: i work for github on events/AV)

link

SixSigma 3560 days ago

His subsequent colleagues fired it at Usenet

https://en.wikipedia.org/wiki/Mark_V._Shaney

link