Ask HN: Best place to start learning about Markov Chains?

Y	Hacker News new \| ask \| show \| jobs

	Ask HN: Best place to start learning about Markov Chains?
	237 points by chrisherd 2622 days ago
	A progressive reading list or process to follow would be awesome

37 comments

dcwca 2622 days ago

Just pick a random place to start, read some stuff, and then take a guess as to which direction to go in next, based on what's probably a good next thing to read. Then keep repeating the process over and over again.

link

Scarblac 2622 days ago

It's also important that you base your guess of what's probably good to read next only on the previous thing you read. Forget everything that came before that.

link

yonkshi 2622 days ago

My friend Gibbs invented this really efficient way to learn.

link

muzani 2620 days ago

If you don't feel ready to move on to something new, you could always read the same thing again.

link

d3ckard 2622 days ago

Brilliant joke!

link

zeckalpha 2622 days ago

Are you describing Markov chains or how to learn about Markov chains?

link

yonkshi 2622 days ago

More specifically Markov chain with monte carlo method (MCMC)

link

graycat 2622 days ago

Did something like that: An organization with some boats, quite a lot of boats, some that might be involved in global nuclear war, maybe limited to sea, wanted to know how long some of the boats might survive. The ocean had Red and Blue boats and airplanes, and the Reds and Blues were looking for each other and trying to kill each other.

So, the state of the system was the remaining Red/Blue inventories.

Some work by Koopmans showed that the encounter rates were a Poisson process. So, the time to the next encounter had exponential distribution, depending on the current state.

At an encounter, depending on the types, could have the Red die, the Blue die, both die, or neither die. Then after the encounter, the state of the system changed. So, the state of the system was a continuous time, discrete state space Markov process subordinated to a Poisson process. That is, in part, a Markov chain.

Yes, there is a closed form solution, but the combinatorial explosion of the discrete state space size meant that a direct attack via the closed form solution was not reasonable.

But it was easy enough to do Monte-Carlo, that is, generate a few hundred sample paths and average those, get confidence intervals, etc. While in grad school working on operations research I did that. While the state space was enormous, the Monte-Carlo was really fast. On any computer of today, the code would run before could get finger off the mouse button or the Enter key. And running off 1 million sample paths would be feasible. For the random numbers I looked in Knuth's appropriate volume of The Art ... and used

X(n + 1) = X(n) * 5^15 + 1 mod 2^47

programmed in assembler.

Work passed review by famous applied probabilist J. Keilson.

Apparently the work was sold to some intelligence agency. I could guess which one, but then I'd have to ...!

link

blablabla123 2622 days ago

I think you need to create a truly immersive experience to truly learn them.

link

lallysingh 2622 days ago

It's a meta joke

link

penagwin 2622 days ago

It's hilarious because it's also a "semi"-decent method on how to learn knew topics in general.

link

kawera 2622 days ago

And goes well with the article[1] we discussed yesterday[2] !

[1] https://billwadge.wordpress.com/2016/01/08/b-before-a/

[2] https://news.ycombinator.com/item?id=19608883

link

soVeryTired 2622 days ago

If there's a finite amount of literature on the subject, this advice will send the OP in circles with probability one.

link

pdpi 2622 days ago

You can then model the probability that you'll end up at any one given place after n steps as a markov chain.

link

bluejay2387 2622 days ago

Funniest thing I have read this week...

link

samcgraw 2622 days ago

Well said ;)

link

machawinka 2621 days ago

Brilliant.

link

gtycomb 2622 days ago

So many there are. Starting with basic Probability, this lecture series is a good first intro.

https://www.dartmouth.edu/~chance/teaching_aids/books_articl...

Or starting from the basics, and learning how to actually do the number crunching, this is unusually good (Stewart, Introduction to numerical solution of Markov Chains):

https://press.princeton.edu/titles/5640.html

Robert Gallager's MIT lecture series, very well presented, titled Principles of Digital Communications, takes you on another train based on Markov Chains (Kalman filters, etc).

https://ocw.mit.edu/courses/electrical-engineering-and-compu...

link

activatedgeek 2622 days ago

Markov chains in essence are simple. Instead of diverging and reading all the theory, I'd recommend do it on a need basis. Learn as you go. So pick up a problem and move ahead. I don't think it is fruitful to just learn everything about Markov Chains just for the sake of it.

Markov Chain Monte Carlo to sample from probability distributions is a good start - https://arxiv.org/abs/1206.1901 if you are into sampling.

link

AlexCoventry 2622 days ago

Betancourt's survey is at least as good, and more up to date.

https://arxiv.org/pdf/1701.02434.pdf

link

activatedgeek 2622 days ago

That's a great reference too for the geometric intuitions!

link

thedevindevops 2622 days ago

Tough one, I'd have to say:

45% http://setosa.io/ev/markov-chains/

30% https://en.wikipedia.org/wiki/Markov_chain

25% Youtube

link

snakeboy 2622 days ago

The wikipedia page for Markov chains is really one of the best wikipedia pages I've ever seen for a technical topic.

Covers a ton of ground, and gives concrete examples to motivate the ideas.

link

usgroup 2622 days ago

1. Elementary probability theory.

2. Poisson processes.

3. The Markov property.

4. Stochastic processes.

5. Realise that you’re missing a background in analysis, therefore you don’t know sh?t about measure theory but you actually need it to know anything deeper . Wonder to yourself if you really want to spend the next 3 years getting a maths background you don’t have.

6. Convince yourself that it’s all just engineering and middle through by picking a project involving non trivial markov chain.

7. Go back and spend 3 years doing foundational maths then repeat point 1-5.

link

larrydag 2622 days ago

While I agree with the progression of knowledge listed here I don't think it requires 3 years of foundation to math. If you have a basic understanding of math already you should be able to pick up the theory fairly well in a couple of months of research and application.

link

usgroup 2622 days ago

I think when you get out of the basic linear algebra and calculus prerequisite and into the analysis and measure theory prerequisite nothing takes a few months anymore :)

link

joker3 2622 days ago

You don't need much math to pick up the very basic theory, but after a certain point you're going to hit a hard wall unless you have a strong background in analysis.

link

soVeryTired 2622 days ago

Poisson processes are continuous time though. If you're interested in Markov chains you only need the discrete-time theory.

In discrete time and discrete space, it mostly just reduces to linear algebra.

link

graycat 2622 days ago

No, can do continuous time discrete state space theory -- the jumps in the discrete state space are at the arrival times of the Poisson process -- that works out easily enough, especially if using Monte-Carlo. See my other post here on Red/Blue stuff.

link

usgroup 2622 days ago

Unless you’re interested in continuous markov chains, infinite state spaces, renewal theory, excessive functions, and so on.

link

exelius 2622 days ago

That whole math sequence was part of my MBA program that culminated in Markov chains for synthetic options pricing after like, 9 months. And this is for business school students; not engineers :)

link

usgroup 2622 days ago

Sure, and for any given application it’ll be possible to explain Markov chains as they apply to it. I recently did a financial valuation course where we did an “intuitive derivation of Itos formula” so that we could skip the measure theory prerequisites. We also skipped talking about Reimann integrals and just accepted that sums are integrals at a limit ... we also glossed the separating hyperplane theorem so that we could say “no arb iff risk neutral measure exists”, and so on.

However, if you actually want a background in the theory of Markov chains, I don’t think this approach works.

link

graycat 2622 days ago

Just work with discrete state spaces and otherwise be less concerned with measure theory. E.g., in stochastic control problems, don't sweat measurable selection!

link

bulldoa 2622 days ago

Can you recommend textbook on these topic?

link

YorkshireSeason 2622 days ago

If you are not already intimately familiar with them learn about FSA (= finite state automata), aka FSM (finite state machines).

Most interesting facts about Markov chains (e.g. the Stationary Distribution Theorem) really are probabilistic generalisations of simpler facts about FSAs (e.g. FSAs cannot be used to "count"). In my experience, understanding them first for FSAs and then seeing how they generalise for the probabilitic case is a good way of approaching this subject.

link

Vaslo 2622 days ago

Here is an excellent place to start:

http://setosa.io/ev/markov-chains/

link

DevX101 2622 days ago

Highly recommended. Preferred way to learn is to grasp an intuitive understanding before diving deep into theory. This visual explainer is great first step.

link

notinventedhear 2622 days ago

For a broad introduction to Bayesian analysis, MCMC and PyMC I'd suggest Bayesian Methods for Hackers[1]

[1] http://camdavidsonpilon.github.io/Probabilistic-Programming-...

link

localhostdotdev 2622 days ago

markov chains are very simple at their core (e.g. simple version could be: take the probability of the next word given the known probabilities of words that follow the previous word)

it can be implemented in a few lines of code, that's the beauty of it: https://github.com/justindomingue/markov_chains/blob/master/...

obviously then you could take the previous n words into account, tweak the starting word, add randomness, etc.

now replace "word" with "state" and "probability(next state | previous state)" to edges of a graph: https://static1.squarespace.com/static/54e50c15e4b058fc6806d...

and you got a generic markov chain :)

footnotes: p(A | B) is probability of A given B, e.g. p(rain | clouds) > p(rain | sun) :)

link

crshults 2622 days ago

I thought this recent post: 'Generating More of My Favorite Aphex Twin Track'[1] had a good beginner-level write up on Markov Chains. [1]https://news.ycombinator.com/item?id=19490832

link

nrjames 2622 days ago

What I would do is use the Markovify python library and feed it with several texts from Project Gutenberg... try to generate some Lovecraftian prose or something...

https://github.com/jsvine/markovify

link

YeGoblynQueenne 2622 days ago

Personally, I started with Eugene Charniak's Statistical Language Learning [1] then continued with Manning and Schütze's Foundations of Statistical Natural Language Processing [2] and Speech and Language Processing by Jurafsky and Martin [3].

The Charniak book is primarily about HMMs and quite short, so it's the best introduction to the subject. Manning and Schütze and Jurafsky and Martin are much more extensive and cover pretty much all of statistical NLP up to their publication date (so no LSTMs if I remember correctly) but they are required reading for an in-depth approach.

You will definitely want to go beyond HMMs at some point, so you will probably want the other two books. But, if you really just want to know about HMMs, then start with the Charniak.

______________

[1] https://mitpress.mit.edu/books/statistical-language-learning

[2] https://nlp.stanford.edu/fsnlp/

[3] https://web.stanford.edu/~jurafsky/slp3/

link

evmar 2622 days ago

For hidden Markov models (which only look into after you get the basics), I recall that this widely-cited paper (perhaps the original?) is pretty readable. From the title it looks like it's about speech but ignore the speech parts and read the math:

https://www.robots.ox.ac.uk/~vgg/rg/papers/hmm.pdf

link

danaugrs 2622 days ago

I really like this short, relaxed video: "Information Theory part 10: What is a Markov chain?" by Art of the Problem https://www.youtube.com/watch?v=o-jdJxXL_W4

If you like it I recommend watching the whole series.

link

jotaf 2622 days ago

These are my favorite lecture notes, they assume almost no a-priori knowledge (with an awesome review of basic probabilities) and yet they don't shy away from explaining all the rigorous math.

If you have time to read step-by-step derivations and want to understand the fundamentals, I think this is an excellent self-contained resource.

https://ermongroup.github.io/cs228-notes/

link

usgroup 2622 days ago

“No prior knowledge” and “explain all the rigorous maths” are mutually exclusive in my opinion. I stress this as honest advise to anyone reading.

Rigorous maths is akin to trying to explain to your non technical friends what you do in devops: colloquialise it all you want, it’ll always be a shallow story.

link

twiecki 2621 days ago

If you are looking for an explanation of MCMC that focuses on intuitive understanding to complement more mathematical introductions, I wrote a blog post trying to explain things in simple terms here: https://twiecki.io/blog/2015/11/10/mcmc-sampling/

link

ivansavz 2621 days ago

If you're interested in a basic math intro (starting from linear algebra concepts), check out Section 8.2 in this excerpt from the book "No Bullshit guide to Linear Algebra": https://minireference.com/static/excerpts/probability_chapte... This excerpt contains some exercises (with answers in the back) as well an examples application (PageRank).

Technically Linear Algebra is not "required" to understand Markov Chains, but it's a very neat way to think about them: each "step" in the chain is equivalent to multiplication of the state vector by the transition matrix.

link

maurits 2622 days ago

My personal favorite introduction to MC(MC) is lecture 1 of statistical mechanics and computations [1]

[1]: https://www.coursera.org/learn/statistical-mechanics

link

melling 2622 days ago

I’ve got a couple of links here:

https://github.com/melling/MathAndScienceNotes/tree/master/s...

link

jerednel 2622 days ago

I learned quite a bit by exploring attribution modeling with them. There is an R package where you can just faceroll a model without really understanding anything so I tried recreating it in Python https://github.com/jerednel/markov-chain-attribution - its messy for sure but it is a learning exercise and it helped me understand the concept quite a bit. That currently only supports the simplest use case of a first order markov chain.

link

jamesb93 2622 days ago

Make one with a direct application. I did one to model melody from Bach in a stupid way. It was made in Max, so I can't provide the size of the code in any meaningful way, but its basically just a text file with an index and a number of possibilities related to that index.

https://soundcloud.com/jamesbradbury/9th-order-markov-chain-...

link

sublimino 2621 days ago

Markov Chains can be quite amusing when applied to a corpus of similar texts, and often stunningly human-like. I maintain a list of humourous applications: https://github.com/sublimino/awesome-funny-markov

Some favourites:

- Erowid trip reports and tech recruiter emails - https://twitter.com/erowidrecruiter

- Calvin and Markov - Calvin and Hobbes strips reimagined http://joshmillard.com/markov/calvin/

- Generate your future tweets based on the DNA of your existing messages - http://yes.thatcan.be/my/next/tweet/

- Fake headlines created by smashing up real headlines - https://www.headlinesmasher.com/best/all

- The most confusing subreddit (often on the front page) - https://www.reddit.com/r/subredditsimulator

The original Markov-generated content prank: "I Spent an Interesting Evening Recently with a Grain of Salt" https://web.archive.org/web/20011101013348/http://www.sincit...

And of course (un-amusingly!) - Google's PageRank algorithm is built on Markov Chains https://en.wikipedia.org/wiki/PageRank#Damping_factor

n.b. there used to be parodies of Hacker News, but both are down: https://news.ycombniator.com/ and https://lou.wtf/phaker-news

link

maxmouchet 2622 days ago

For an introduction to discrete and continuous-time Markov chains, as well as an application to queuing theory, you can check the MOOC "Queuing Theory: from Markov Chains to Multi-Server Systems" on edX [1].

[1] https://www.classcentral.com/course/edx-queuing-theory-from-...

link

thepill 2622 days ago

http://setosa.io/ev/markov-chains/

link

DanBC 2622 days ago

Not sure it's introductory, but A Mathematical Theory of Communication, page 5 onwards, is useful: http://www.math.harvard.edu/~ctm/home/text/others/shannon/en...

link

segmondy 2622 days ago

The wikipedia page is actually good and how I learned about it. https://en.wikipedia.org/wiki/Markov_chain follow through with some random googling, read then implement it. It's really simple for something that sounds so fancy. :)

link

mindcrime 2622 days ago

David Silver's course on Reinforcement Learning contains some good information on Markov processes. See Lecture #2 in particular.

https://www.youtube.com/playlist?list=PL7-jPKtc4r78-wCZcQn5I...

link

platz 2622 days ago

-- Markov Decision Processes

there is a lot of info out there about markov chains, but very little about markov decision processes (MDP).

How popular are MDP? What are their strengths? weaknesses?

-- Kalman Filters vs HMM (Hidden Markov Model):

"In both models, there's an unobserved state that changes over time according to relatively simple rules, and you get indirect information about that state every so often. In Kalman filters, you assume the unobserved state is Gaussian-ish and it moves continuously according to linear-ish dynamics (depending on which flavor of Kalman filter is being used). In HMMs, you assume the hidden state is one of a few classes, and the movement among these states uses a discrete Markov chain. In my experience, the algorithms are often pretty different for these two cases, but the underlying idea is very similar." - THISISDAVE

-- HMM vs LSTM/RNN:

"Some state-of-the-art industrial speech recognition [0] is transitioning from HMM-DNN systems to "CTC" (connectionist temporal classification), i.e., basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well. Speech was one of the places where HMMs were _huge_, so that's kind of a big deal." -PRACCU

"HMMs are only a small subset of generative models that offers quite little expressiveness in exchange for efficient learning and inference." - NEXTOS

"IMO, anything that be done with an HMM can now be done with an RNN. The only advantage that an HMM might have is that training it might be faster using cheaper computational resources. But if you have the $$$ to get yourself a GPU or two, this computational advantage disappears for HMMs." - SHERJILOZAIR

link

micheda 2622 days ago

The hmm_filter project implements Viterbi-inspired algorithms and transition matrices in Python, might be also a useful learning resource: https://github.com/minodes/hmm_filter

link

orasis 2621 days ago

The most important thing is to realize just how damn simple they are. As you get mired in the literature everything will seem overwhelmingly complex. Just grok the very very basic idea of them and it will come easier.

Also, they’re just a convenient model (for some problems), not a holy truth.

link

AlexCoventry 2622 days ago

You could try Gelman et al.'s Bayesian Data Analysis. It has a good overview of MCMC.

If you want an overview of Markov chains as statistical models in their own right, Durbin et al.'s Biological Sequence Analysis is a well-motivated overview.

link

ggggtez 2622 days ago

There isn't really very much to learn. Just start on wikipedia, and expand out if you think there is something more. Markov Chains are very simple in practice.

link

i_am_proteus 2622 days ago

If the "motivation-theorem-proof" style appeals to you, find a copy of Finite Markov Chains by Kemeny and Snell. ISBN 0442043287

link

ackbar03 2622 days ago

How about a textbook maybe? There aren't always easy alternatives out there, sometimes you have to bite the bullet and do the work

link

tnecniv 2622 days ago

Do you have an application in mind to help guide suggestions?

As others have said, if you know know probability, start there.

link

currymj 2622 days ago