| HN Mirror

You're right (and nitpicking nitpicks seems appropriate to me :P).

But, I'm pretty sure the assumptions of logistic regression are even stronger than just that. The inputs are assumed to be independent given the output class, and the log odds of the output vary as a linear function of each input. The first one is essentially the naive Bayes assumption, and the second one is completely unreasonable for almost any problem ever (roughly equivalent to assuming every dataset has a multivariate normal distribution). If they are both correct, though, you get a perfectly good Bayesian posterior probability of each output class.

I think the lesson is that gradient descent will build a decent function approximation out of pretty much anything powerful enough, which is why neural networks still work even when probability theory has been thrown completely out the window.