Hacker News new | ask | show | jobs
by crack-the-code 3396 days ago
It's probably more likely that they want each output to be independent of the other. Certain features may be predominantly associated with a tiger, but not necessarily indicative of a terrestrial animal. If the 9.89% chance that they could have been wrong would have been the case, then that should not influence whether or not it was a terrestrial animal. In my opinion, the consumer of the output values should be able to rely on these fields independently, and make these associations themselves. Although I totally agree a second pass could be useful as a separate data set.
2 comments

Still, in any consistent way of assigning probabilities to events, if A implies B, then P(A) <= P(B).

Neural network outputs are not probabilities. I think that's the main lesson here.

Given that the last layer of a NN is a logistic regression, they are in fact well-calibrated probabilities under the assumption disjoint classes.

The issue at hand is training them on overlapping classes :-)

I will shut up now, sorry for nitpicking.

You're right (and nitpicking nitpicks seems appropriate to me :P).

But, I'm pretty sure the assumptions of logistic regression are even stronger than just that. The inputs are assumed to be independent given the output class, and the log odds of the output vary as a linear function of each input. The first one is essentially the naive Bayes assumption, and the second one is completely unreasonable for almost any problem ever (roughly equivalent to assuming every dataset has a multivariate normal distribution). If they are both correct, though, you get a perfectly good Bayesian posterior probability of each output class.

I think the lesson is that gradient descent will build a decent function approximation out of pretty much anything powerful enough, which is why neural networks still work even when probability theory has been thrown completely out the window.

It's really tough to take the outputs seriously if they are not even on the same scale. That is, tiger, cat, and bengal tiger all imply terrestrial animal. That means that they should all be scaled around that. That is to say that terrestrial animal would need to be at least max(tiger, cat, bengal tiger).
Maybe. Or maybe this is a human-centric view. Imagine that the classifier worked on sound. A low growl could be a cat, a tiger or a submarine engine. Then the probabilities might be flipped - if it's a land animal, it might be 40/60 that it's a tiger or a cat.

A visual classifier that identify "4 moving things" might indicate some kind of land animal, or slow motion video of a Dragonfly in flight.

Sample/"evidence"-based reasoning will always have these kind of odd inconsistencies - I'm not sure if mapping such output to a logic model is an improvement. It might be - to take output from a classifier like this, and plug it into an expert system like a Prolog/datalog database or something. Or it might just end up being just as limited as those systems already are.

But when one says "tiger implies terrestrial mammal (or animal)", one is really talking about ontologies -- perhaps training the classifier to come up with things like "90% sure four legs, 60% sure fur" and plugging that into a logic based system would yield good hybrid systems?

I do think one would then loose the "magic" effectiveness of these pure(ish) learning systems though? Perhaps someone more familiar with the domains might shed some light?