Hacker News new | ask | show | jobs
by njohnson41 3390 days ago
Still, in any consistent way of assigning probabilities to events, if A implies B, then P(A) <= P(B).

Neural network outputs are not probabilities. I think that's the main lesson here.

1 comments

Given that the last layer of a NN is a logistic regression, they are in fact well-calibrated probabilities under the assumption disjoint classes.

The issue at hand is training them on overlapping classes :-)

I will shut up now, sorry for nitpicking.

You're right (and nitpicking nitpicks seems appropriate to me :P).

But, I'm pretty sure the assumptions of logistic regression are even stronger than just that. The inputs are assumed to be independent given the output class, and the log odds of the output vary as a linear function of each input. The first one is essentially the naive Bayes assumption, and the second one is completely unreasonable for almost any problem ever (roughly equivalent to assuming every dataset has a multivariate normal distribution). If they are both correct, though, you get a perfectly good Bayesian posterior probability of each output class.

I think the lesson is that gradient descent will build a decent function approximation out of pretty much anything powerful enough, which is why neural networks still work even when probability theory has been thrown completely out the window.