Bishop Pattern Recognition and Machine Learning is an awesome starting point. I think it's way more coherent (although it probably is less comprehensive) than Elements of Statistical Learning.
I do like both of the books; but I found them more focused on mathematical analysis of algorithms rather than analysis of the real world noisy data (and choosing/making which algorithm to use).