| > a low confidence score Neural nets should return a low confidence score. But, the popular approach (described below) ignores that. Neural nets ignore confidence because of a technique called softmax [1]. This happens as the final operation of a neural net, and is required for training. Softmax is a tool to make an array of positive numbers look like a probability distribution: out = x / x.sum()
x[i] is a class prediction, but x.sum() != 1. Say if the network was uncertain, x[cat, dog] = [0.03, 0.01]. These are small values that do not imply great confidence (the network was trained on vectors with out.sum() = 1. The network would predict “dog” using softmax because out[dog] = 0.75 > 0.25 = out[cat].But then in inference/prediction, the confidence is ignored. What if x.sum() is small? That would imply that the network is uncertain. [1]: https://en.m.wikipedia.org/wiki/Softmax_function |
In other words, if you only have two object classes, the magnitude of the outputs does not matter, the uncertainty is measured by the relative difference of the outputs.
The only way to measure the confidence of the model that the output is "cat OR dog", is to have another class (e.g. "chair"), only then, looking at all three outputs you can estimate the confidence of the model regarding "cat OR dog" predictions (vs 'NOT (cat OR dog)"). For example, if [cat, dog, chair] outputs are [0.03, 0.01, 0.05] then we know the model is not confident that it's either a cat or a dog, but if the outputs are [0.75, 0.25, 0.05], then it's clear it is.