| Convex: Bertsekas - Convex Optimization Theory, Convex Optimization Algorithms.
Nesterov - Lecture Notes (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.693...) Statistical/Theoretical: Shai Shalev-Schwartz & Shai Ben-David's Understanding Machine Learning (http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning...)
Mohri's Foundations of Machine Learning (https://cs.nyu.edu/~mohri/mlbook/) The two above courses could share SSS's Online Learning text (https://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf). To be fair, the stochastic variants of most optimization algorithms can be learned reasonably quickly off of a statistical machine learning/basic optimization background. There's the option of Spall's Intro to Stochastic Search and Optimization, which covers neural networks, reinforcement learning, annealing, MCMC, and a wide variety of other applicaitons and techniques. (http://www.jhuapl.edu/ISSO/) Similar to what kxyvr said, I also don't know of any killer linear algebra text, which is why I think a course is so useful. The matrix cookbook is helpful along the way. kxyvr is also entirely right that general nonlinear optimization is important -- though perhaps less indispensable. (Going the other way, the Bertsimas linear optimization textbook I've had for years mostly gathers dust.) For PGMs: I got Predicting Structured Data back when it was new (https://mitpress.mit.edu/books/predicting-structured-data), but I think that Chris Bishop's treatment in PRML is easier to follow. He has some lecture slides which expand on it quite well. (https://www.microsoft.com/en-us/research/people/cmbishop/) Bishop would also be my go-to intro ML book over Murphy. I can't in fairness offer recommendations for the rest of the intermediate undergraduate math texts because I took them so long ago, but I can say that I have benefited from reviewing the MIT OCW courses from time to time. |