| HN Mirror

Indeed. Note though that proper scoring rules form a large class and it can matter which one you choose.

For example, for logistic regression, things become a lot simpler if one chooses log loss (equivalently KL divergence) because one ends up with a convex minimization problem. Had one chosen Brier score here the problem is no longer convex and where one starts the training iteration will determine where the updates converge to. Sometimes this indeterminacy is a problem -- am getting poor results, is it because the data has changed, or is it that my initial seed has changed and the udates have converged to a worse solution.