Hacker News new | ask | show | jobs
by lars 5217 days ago
A good example is regularization. You have nice proofs saying that your classifier is optimal, then you tack on a regularization term to it, which breaks your optimality proof but improves your classification accuracy. It seems unexpected, but it's not really all that surprising when you get down to the details of it.
1 comments

Oops hit the down-arrow without intending to, my bad, hope someone will fix that.

There is nothing tacked on about a regularizer though, it is very sound even in theory. There are several ways to look at it. One way is to see it as a natural consequence of Bayes law, it is just the log of the prior probability. There are certain things we know or assume about the model even before looking at the data, for example we expect the predictions to have a certain smoothness etc, all this knowledge can incorporated into the prior model, and that is what the regulaizer is. Another way to look at it from stability of the estimates of the parameters. I find the former more convincing.

Absolutely, there's a pretty clear mathematical justification for regularization. However, it is very literally tacked on at the end. Take logistic regression, if you minimize the cost function without regularization, you get a max-likelihood estimate of the regression parameters. But what we do is to add a regularization term to that cost function. Minimizing that cost-function will no longer give a MLE solution, but it will (likely) give a better solution. It all comes down to understanding that the MLE property is an asymptotic result. Same goes for covariance matrix estimates, where you have regularization procedures that are guaranteed to never be worse than the plain MLE solution.