Hacker News new | ask | show | jobs
by ddt 4339 days ago
There are complicated ways of doing this, but the naïve way is as follows:

First you need a corpus of text that's grammatically correct

Each node in the chain is a word or piece of punctuation. Each word has a certain probability of being followed by every other word in the corpus, including itself. There are a few different ways to start the sentence. One approach is to start from the node for the punctuation mark ".", and only selecting a following node that is not a period, since sentences don't tend to start with punctuation. From there, use a random number generator to pick a following node based on your probability matrix, rinse, repeat.

If you'll notice, there's no guarantee that it will be grammatically correct. There's just some statistical likelihood that it will be.

2 comments

If you'll notice, there's no guarantee that it will be grammatically correct. There's just some statistical likelihood that it will be.

Which is also true for human speakers.

Here is cool generator that demonstrates this in action: http://projects.haykranen.nl/markov/demo/