Hacker News new | ask | show | jobs
by JHonaker 1995 days ago
This is probably my favorite introductory machine learning book. The fact that he places almost everything in the language of graphical models is such a good common ground to build off.

This really sets you up to realize that there is (and should be) a lot more to doing a good job in machine learning than simply minimizing an objective function. The answers you get depend on the model you create as do the questions you can hope to answer.

I don't see a clear list of differences between this new edition. Does anyone know what's new?

2 comments

Agree with you. But none of this is useful for practical (applied) machine learning. I don't want to disappoint you but you can read it as machine learning porn, but otherwise don't waste time on it.
I mean, as a graduate student, it was definitely incredibly useful. As a practicing data scientist, I’d have to say that it’s also incredibly useful.

I’ve used this stuff, and more often, the ideas taught, to break down a problem into a tackle-able set of pieces more times than I can count.

Never underestimate the fundamentals. Too many of my colleagues use models without actually understanding any of it. I’ve debugged so many problems by looking at the technical details in original papers and textbooks.

Are you saying the book itself is ML porn?
Yes, unless you are among 20 top researchers who are working on frontier of ml. Bayesian probabilistic techniques does not work or are very slow for any practical purpose.
Oh crumbs! There I was thinking that by obtaining an estimate of the probabilities of the responses of different groups to an employee survey I was applying a bayesian probalistic approach.

I'm going to have to rethink everything now as since it worked and was quite quick (I didn't even sample using MCMC, just brute force pulled permutations) so it was clearly not a bayesian approach, and I am very very far from one of the top 20 (or 200, or 2000 or 20000, maybe 200000?) researchers...

This may be true for whatever small corner of the data science world you inhabit but it isn’t true in general.

To choose just one example, the analysis of the new UK COVID variant relies on Bayesian modelling, both for the government analysis and the Imperial paper. (https://www.imperial.ac.uk/media/imperial-college/medicine/m...)

Turing.jl[1] is quite usable and isn't slow[2].

[1] - https://turing.ml/dev/ [2] - https://arxiv.org/abs/2002.02702

Can it handle 1000 predictor with 1 million data points?
What is the advantage of placing everything in the language of graphical model? How does the other ML book do it?
Graphical models are just a way to encode relationships between different variables in a probabilistic model. Directed acyclic graphs (DAGs) allow you to specify (most of) the conditional independence structures that you can have between things like parameters and random variables.

This is really useful information because it can help you identify what information is truly relevant for the estimation of certain parameters (so sufficient statistics) or help you crystallize your understanding of the implications of the model you’ve created. In other words, it helps show you the ways in which your model says different aspects of your data should influence others.

This creates testable implications of the model. If your model says that two variables should be conditionally independent given a third, but they’re not, you have an avenue for refinement. You can also clearly identify your assumptions or the implications of your assumptions.

Another great thing about them is that exact inference for certain (most) structures is known to be computationally infeasible. There are a lot of different inference schemes available that can help you with different approximations with various drawbacks/advantages, heuristics that sort of work, or even ways of drawing samples from the true distribution if you can identify the structures. See belief propagation, loopy belief propagation, sequential Monte Carlo, and Markov chain Monte Carlo methods.

On top of this it helps you see everything in a general framework. Lots of the fundamental pieces of ML models are really just slight tweaks to other things. For instance, SVMs are linear models on kernel spaces with a specific structural prior. Same with splines; it’s just a different basis function. All of this helps you see the pieces of different methods that are actually identical. This helps you make connections and learn more effectively, in my opinion.

Absolutely. Very strongly agree with the last paragraph, and that's how I aspire to learn in general. Can you point me to some resources(book or otherwise) that goes over all these relationship in a general framework?
Unfortunately, it's just something you start to notice once you become more familiar with the fundamental math underlying all of this.

The book, Machine Learning: A Probabilistic Perspective by Kevin Murphy (the original book everyone in this thread is talking about) is probably the closest thing I can think of. Its goal is to frame everything around graphical models and probability. It's quite a tome. Still, despite its breadth, it can't possibly cover everything.