That's kinda true, but (regularization aside) for standard loss functions it's minimized at the point it's well calibrated, right? Given the scores in the image (97% animal, 90% tiger, etc) they seem to be binary classifiers e.g. "is this a tiger?" So of all scores in the neighborhood of 90%, 90% should be "yes it is," making it a measure of confidence compatible with probability.
Please someone correct me if I'm wrong, but I'm pretty sure that's how it works, just like how logistic regression gives you a probability.
What you need to do is to take the top prediction and see how accurate it is compared to a test set. The scores on the picture represent confidence not accuracy.
Please someone correct me if I'm wrong, but I'm pretty sure that's how it works, just like how logistic regression gives you a probability.