Hacker News new | ask | show | jobs
by asymptotic 3797 days ago
For the deep learning vertex the OP states:

"At the right vertex, we have Breiman's know-nothing approach—high-capacity models like neural nets, decision forests, and nonparametrics that will fit anything given enough data. This is engineering with less science (see these remarks). Deep learning people cluster here."

The phrase "will fit anything given enough data" is misleading and not correct about cutting-edge machine learning methods. "Fit" is a useless term, and you will instead find people talking about "bias" and "variance".

For any supervised method (you know the intended outputs) you apply to predict data, there are three sources of error: bias, variance, and random. Random error is some irreducible unpredictability that cannot be modeled. Bias occurs from bad assumptions made by the model itself (e.g. maybe the model is too simple). Variance is sensitivity to small changes in the data the algorithm is trained on.

High bias means the model is too simple to capture all the variations in the data set. High variance means the model is too overfit on the data at hand and it is not successfully generalizing to unseen data. In real-world problems there is a direct tradeoff between bias and variance. Nevertheless the goal of any supervised learning model is to have both low bias and low variance.

By splitting off a big (~10-20%) chunk of all data available into a "test" set, training the model on the remaining "train" set, then evaluating it on the "test" set, it's possible to estimate the generalizability of the model on future "unseen" data by whatever metric you want. By additionally plotting learning curves one can crudely estimate whether we have high bias or high variance.

Hence the insinuation that machine learning blindly "fits" data as much as possible is false. Sophisticated (yet not difficult) methods both minimize and estimate the generalizability of the model to future unseen data (minimizing variance), inevitably at the cost of some notion of accuracy (increasing bias).

I think the OP's objection is that such ML methods "know nothing". This is a trivial statement to make. Rather, I would turn the objection on its head and ask "If our methods achieve acceptable estimated generalizability on unseen data, do we need to know anything?". This reminds me of Alan Turing's arguments about machines passing the Turing Test vs. "are they really human?".

2 comments

While everything you've said is technically true, straight out of the textbook, I don't really see how it contradicts what he says.

These high-capacity models (neural nets, decision trees, boosting) do overfit like crazy and tend to be used as black boxes without any domain knowledge. The key in his statement is when he says "given enough data," because having tons of data is one of the best ways to combat overfitting (given enough data, variance is negligible). And the fact that we can measure how much they overfit and take steps to regularize doesn't change the fact that, for example, deep learning is really way more of an engineering discipline than a mathematical or statistical discipline. And these are not criticisms of those areas at all: those are exciting areas of research precisely because there are so many unsolved problems and areas where we are working without a solid understanding!

Calling them "know-nothing" isn't criticism, rather, it's merely to say they lack assumptions of parametric models (or that they are even statistical) and lack guarantees w.r.t. error bounds. Whether that is good or bad depends on the goal, e.g. prediction or interpretation. There's also no doubt that he understands the bias-variance tradeoff... (hint: check his homepage).