To bridge the gap between naive Bayes and LDA, I would recommend going from k-means to EM and then from EM to variational Bayes. K-means to EM is covered in chapters 20 (pp. 286-), 22 (pp. 300-) and 33 (pp. 422-) of MacKay's ITILA [1] (excellent and free book BTW). I recommend to learn about (== apply on something) the junction tree algorithm because you will have to brush on graph theory. Also, do more convex optimization beforehand than I did, or you will have to catch up: take a full course or full book on it.
For LDA you'll need to understand Dirichlet processes, I find the introduction by Frigyik et al. [2] to be excellent for that. You may need to read A Measure Theory Tutorial (Measure Theory for Dummies) by Gupta [3] before. Finally, I put there the two most influential LDA papers to me: [4] and then [5].
In case the measure theory put anyone off: you don't need Dirichlet Processes for plain LDA, just the finite-dimensional http://en.wikipedia.org/wiki/Dirichlet_distribution (which isn't so bad and a very useful tool in Bayesian stats as the conjugate prior for discrete observations)
For some of the non-parametric variants like hierarchical dirichlet process LDA, you need DPs, but that stuff is pretty hardcore -- don't walk before you can run.
Another route to LDA (assumes some Bayesian stats basics):
* Learn a bit about Markov chains if you don't know them already
* Read up on sampling-based approximate inference methods and find a proof that a Gibbs sampler converges (or just take it on trust...)
* Read the classic Griffiths and Steyvers paper deriving a collapsed Gibbs sampler for LDA [1]
For some of the non-parametric variants like hierarchical dirichlet process LDA, you need DPs, but that stuff is pretty hardcore -- don't walk before you can run.
Another route to LDA (assumes some Bayesian stats basics):
* Learn a bit about Markov chains if you don't know them already * Read up on sampling-based approximate inference methods and find a proof that a Gibbs sampler converges (or just take it on trust...) * Read the classic Griffiths and Steyvers paper deriving a collapsed Gibbs sampler for LDA [1]
[1] http://www.pnas.org/content/101/suppl.1/5228.full.pdf