| HN Mirror

It looks like some work has been done on using AIC for neural networks:

In principle it would be straightforward, right? AIC = 2k - 2ln(L). So set k = # weights + # biases, and use the log-likelihood as the objective function so you can just read off L from there.

I wonder if the reason why AIC is unpopular is that it's harder to explain to your boss than accuracy, precision, recall, or even proper scoring. This is perhaps more true now that statistical literacy in management increases -- the notion that you can't use training data to estimate performance is becoming popular. Now here comes a magic formula, calculated on the training set, that supposedly tells me how well the model will perform... that's not gonna fly in a lot of settings.

There's also the question of whether it even tells us what we want to know. AIC is an estimate of Kullback-Leibler information of a probability model, under somewhat-restrictive conditions [0]. The question of "why don't we use AIC?" might be the same as the question of "why don't we use proper scoring rules? -- people want to know accuracy, so they just go ahead and estimate accuracy by brute-force resampling. I'm not saying it's right, but until people see tangible value in thinking about their models from a probabilistic perspective, they won't be motivated to do so.

[0]: http://www4.ncsu.edu/~shu3/Presentation/AIC.pdf