Just pick a random place to start, read some stuff, and then take a guess as to which direction to go in next, based on what's probably a good next thing to read. Then keep repeating the process over and over again.
It's also important that you base your guess of what's probably good to read next only on the previous thing you read. Forget everything that came before that.
Did something like that: An organization with some boats, quite a lot of boats, some that might be involved in global nuclear war, maybe limited to sea, wanted to know how long some of the boats might survive. The ocean had Red and Blue boats and airplanes, and the Reds and Blues were looking for each other and trying to kill each other.
So, the state of the system was the remaining Red/Blue inventories.
Some work by Koopmans showed that the encounter rates were a Poisson process. So, the time to the next encounter had exponential distribution, depending on the current state.
At an encounter, depending on the types, could have the Red die, the Blue die, both die, or neither die. Then after the encounter, the state of the system changed. So, the state of the system was a continuous time, discrete state space Markov process subordinated to a Poisson process. That is, in part, a Markov chain.
Yes, there is a closed form solution, but the combinatorial explosion of the discrete state space size meant that a direct attack via the closed form solution was not reasonable.
But it was easy enough to do Monte-Carlo, that is, generate a few hundred sample paths and average those, get confidence intervals, etc. While in grad school working on operations research I did that. While the state space was enormous, the Monte-Carlo was really fast. On any computer of today, the code would run before could get finger off the mouse button or the Enter key. And running off 1 million sample paths would be feasible. For the random numbers I looked in Knuth's appropriate volume of The Art ... and used
X(n + 1) = X(n) * 5^15 + 1 mod 2^47
programmed in assembler.
Work passed review by famous applied probabilist J. Keilson.
Apparently the work was sold to some intelligence agency. I could guess which one, but then I'd have to ...!
Or starting from the basics, and learning how to actually do the number crunching, this is unusually good (Stewart, Introduction to numerical solution of Markov Chains):
Robert Gallager's MIT lecture series, very well presented, titled Principles of Digital Communications, takes you on another train based on Markov Chains (Kalman filters, etc).
Markov chains in essence are simple. Instead of diverging and reading all the theory, I'd recommend do it on a need basis. Learn as you go. So pick up a problem and move ahead. I don't think it is fruitful to just learn everything about Markov Chains just for the sake of it.
Markov Chain Monte Carlo to sample from probability distributions is a good start - https://arxiv.org/abs/1206.1901 if you are into sampling.
5. Realise that you’re missing a background in analysis, therefore you don’t know sh?t about measure theory but you actually need it to know anything deeper . Wonder to yourself if you really want to spend the next 3 years getting a maths background you don’t have.
6. Convince yourself that it’s all just engineering and middle through by picking a project involving non trivial markov chain.
7. Go back and spend 3 years doing foundational maths then repeat point 1-5.
While I agree with the progression of knowledge listed here I don't think it requires 3 years of foundation to math. If you have a basic understanding of math already you should be able to pick up the theory fairly well in a couple of months of research and application.
I think when you get out of the basic linear algebra and calculus prerequisite and into the analysis and measure theory prerequisite nothing takes a few months anymore :)
You don't need much math to pick up the very basic theory, but after a certain point you're going to hit a hard wall unless you have a strong background in analysis.
No, can do continuous time discrete state space theory -- the jumps in the discrete state space are at the arrival times of the Poisson process -- that works out easily enough, especially if using Monte-Carlo. See my other post here on Red/Blue stuff.
That whole math sequence was part of my MBA program that culminated in Markov chains for synthetic options pricing after like, 9 months. And this is for business school students; not engineers :)
Sure, and for any given application it’ll be possible to explain Markov chains as they apply to it. I recently did a financial valuation course where we did an “intuitive derivation of Itos formula” so that we could skip the measure theory prerequisites. We also skipped talking about Reimann integrals and just accepted that sums are integrals at a limit ... we also glossed the separating hyperplane theorem so that we could say “no arb iff risk neutral measure exists”, and so on.
However, if you actually want a background in the theory of Markov chains, I don’t think this approach works.
Just work with discrete state spaces and otherwise be less concerned with measure theory. E.g., in stochastic control problems, don't sweat measurable selection!
If you are not already intimately familiar with them learn about FSA (= finite state automata), aka FSM (finite state machines).
Most interesting facts about Markov chains (e.g. the Stationary Distribution Theorem) really are probabilistic generalisations of simpler facts about FSAs (e.g. FSAs cannot be used to "count"). In my experience, understanding them first for FSAs and then seeing how they generalise for the probabilitic case is a good way of approaching this subject.
Highly recommended. Preferred way to learn is to grasp an intuitive understanding before diving deep into theory. This visual explainer is great first step.
markov chains are very simple at their core (e.g. simple version could be: take the probability of the next word given the known probabilities of words that follow the previous word)
I thought this recent post: 'Generating More of My Favorite Aphex Twin Track'[1] had a good beginner-level write up on Markov Chains.
[1]https://news.ycombinator.com/item?id=19490832
What I would do is use the Markovify python library and feed it with several texts from Project Gutenberg... try to generate some Lovecraftian prose or something...
Personally, I started with Eugene Charniak's Statistical Language Learning [1] then continued with Manning and Schütze's Foundations of Statistical Natural Language Processing [2] and Speech and Language Processing by Jurafsky and Martin [3].
The Charniak book is primarily about HMMs and quite short, so it's the best introduction to the subject. Manning and Schütze and Jurafsky and Martin are much more extensive and cover pretty much all of statistical NLP up to their publication date (so no LSTMs if I remember correctly) but they are required reading for an in-depth approach.
You will definitely want to go beyond HMMs at some point, so you will probably want the other two books. But, if you really just want to know about HMMs, then start with the Charniak.
For hidden Markov models (which only look into after you get the basics), I recall that this widely-cited paper (perhaps the original?) is pretty readable. From the title it looks like it's about speech but ignore the speech parts and read the math:
These are my favorite lecture notes, they assume almost no a-priori knowledge (with an awesome review of basic probabilities) and yet they don't shy away from explaining all the rigorous math.
If you have time to read step-by-step derivations and want to understand the fundamentals, I think this is an excellent self-contained resource.
“No prior knowledge” and “explain all the rigorous maths” are mutually exclusive in my opinion. I stress this as honest advise to anyone reading.
Rigorous maths is akin to trying to explain to your non technical friends what you do in devops: colloquialise it all you want, it’ll always be a shallow story.
If you are looking for an explanation of MCMC that focuses on intuitive understanding to complement more mathematical introductions, I wrote a blog post trying to explain things in simple terms here: https://twiecki.io/blog/2015/11/10/mcmc-sampling/
If you're interested in a basic math intro (starting from linear algebra concepts), check out Section 8.2 in this excerpt from the book "No Bullshit guide to Linear Algebra": https://minireference.com/static/excerpts/probability_chapte...
This excerpt contains some exercises (with answers in the back) as well an examples application (PageRank).
Technically Linear Algebra is not "required" to understand Markov Chains, but it's a very neat way to think about them: each "step" in the chain is equivalent to multiplication of the state vector by the transition matrix.
I learned quite a bit by exploring attribution modeling with them. There is an R package where you can just faceroll a model without really understanding anything so I tried recreating it in Python https://github.com/jerednel/markov-chain-attribution - its messy for sure but it is a learning exercise and it helped me understand the concept quite a bit. That currently only supports the simplest use case of a first order markov chain.
Make one with a direct application. I did one to model melody from Bach in a stupid way. It was made in Max, so I can't provide the size of the code in any meaningful way, but its basically just a text file with an index and a number of possibilities related to that index.
Markov Chains can be quite amusing when applied to a corpus of similar texts, and often stunningly human-like. I maintain a list of humourous applications: https://github.com/sublimino/awesome-funny-markov
For an introduction to discrete and continuous-time Markov chains, as well as an application to queuing theory, you can check the MOOC "Queuing Theory: from Markov Chains to Multi-Server Systems" on edX [1].
The wikipedia page is actually good and how I learned about it. https://en.wikipedia.org/wiki/Markov_chain follow through with some random googling, read then implement it. It's really simple for something that sounds so fancy. :)
there is a lot of info out there about markov chains, but very little about markov decision processes (MDP).
How popular are MDP? What are their strengths? weaknesses?
-- Kalman Filters vs HMM (Hidden Markov Model):
"In both models, there's an unobserved state that changes over time according to relatively simple rules, and you get indirect information about that state every so often. In Kalman filters, you assume the unobserved state is Gaussian-ish and it moves continuously according to linear-ish dynamics (depending on which flavor of Kalman filter is being used). In HMMs, you assume the hidden state is one of a few classes, and the movement among these states uses a discrete Markov chain. In my experience, the algorithms are often pretty different for these two cases, but the underlying idea is very similar." - THISISDAVE
-- HMM vs LSTM/RNN:
"Some state-of-the-art industrial speech recognition [0] is transitioning from HMM-DNN systems to "CTC" (connectionist temporal classification), i.e., basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well. Speech was one of the places where HMMs were _huge_, so that's kind of a big deal." -PRACCU
"HMMs are only a small subset of generative models that offers quite little expressiveness in exchange for efficient learning and inference." - NEXTOS
"IMO, anything that be done with an HMM can now be done with an RNN. The only advantage that an HMM might have is that training it might be faster using cheaper computational resources. But if you have the $$$ to get yourself a GPU or two, this computational advantage disappears for HMMs." - SHERJILOZAIR
The hmm_filter project implements Viterbi-inspired algorithms and transition matrices in Python, might be also a useful learning resource:
https://github.com/minodes/hmm_filter
The most important thing is to realize just how damn simple they are. As you get mired in the literature everything will seem overwhelmingly complex. Just grok the very very basic idea of them and it will come easier.
Also, they’re just a convenient model (for some problems), not a holy truth.
You could try Gelman et al.'s Bayesian Data Analysis. It has a good overview of MCMC.
If you want an overview of Markov chains as statistical models in their own right, Durbin et al.'s Biological Sequence Analysis is a well-motivated overview.
There isn't really very much to learn. Just start on wikipedia, and expand out if you think there is something more. Markov Chains are very simple in practice.