Hacker News new | ask | show | jobs
by bunderbunder 4495 days ago
Yep. You're building a statistical model of a corpus of text. Given an ordered set of words, it tells you what's likely to come next. e.g,

  [it, is] -> {sunny, 0.75}
  
  [it, is] -> {raining, 0.25}
Tells you that given the phrase "it is", what comes next was "sunny" 3/4 of the time, and "raining" the rest.

Once you've got that, you can use it to generate random text that has similar characteristics to the training corpus. You just seed it with a couple starting words, and then start randomly choosing what word comes next according to the probabilities you've recorded.

jwz's got one that you can play with yourself at http://www.jwz.org/dadadodo/