| HN Mirror

Absolutely, there's a pretty clear mathematical justification for regularization. However, it is very literally tacked on at the end. Take logistic regression, if you minimize the cost function without regularization, you get a max-likelihood estimate of the regression parameters. But what we do is to add a regularization term to that cost function. Minimizing that cost-function will no longer give a MLE solution, but it will (likely) give a better solution. It all comes down to understanding that the MLE property is an asymptotic result. Same goes for covariance matrix estimates, where you have regularization procedures that are guaranteed to never be worse than the plain MLE solution.