In my experience I’ve never found an instance where you would use Brier scores over cross entropy/Bernoulli/Binomial log likelihoods. Does anybody know a concrete example when you would prefer Brier??
Both the Brier score and log loss are proper scoring rules (i.e. optimized when the predicted probabilities are the true outcome probabilities), and the choice between the two seems to have minimal impact on the conclusions that can be drawn (https://pubsonline.informs.org/doi/abs/10.1287/deca.2013.028...). I covered the Brier score in the post as I thought it would be easier to digest for a general audience.
As Frank Harrell wrote on his blog (https://www.fharrell.com/post/class-damage/), one advantage of the Brier score could be its interpretability and the ability to break it decompose it into discrimination and calibration components.
Indeed. Note though that proper scoring rules form a large class and it can matter which one you choose.
For example, for logistic regression, things become a lot simpler if one chooses log loss (equivalently KL divergence) because one ends up with a convex minimization problem. Had one chosen Brier score here the problem is no longer convex and where one starts the training iteration will determine where the updates converge to. Sometimes this indeterminacy is a problem -- am getting poor results, is it because the data has changed, or is it that my initial seed has changed and the udates have converged to a worse solution.
More generally, if one views probability as separate from the utility of the outcome it's attached to, one is bound to make bad decisions.
Real decision problems contain a lot of nonlinearities if decomposed the wrong way. The only way to decompose it is as a linear combination of probability and utility (because the utility swallows the nonlinearities). But for each component both probability and utility matters in determining the overall value of the decision.
The article mentions Brier score is just mean squared error, so it's connected to binomial through that (e.g. where correct prediction is 1, incorrect is 0, it is the mean of the binomial).
For folks who want to try the kind of forecasting being discussed here, Metaculus is a pretty great community: https://www.metaculus.com/
Their FAQ has a great explanation of how they 'score' user forecasts --- including a summary of Brier scores for binary yes/no questions, and the log score used for both binary and continuous questions: https://www.metaculus.com/help/faq/#howscore
I'm not (yet) using a scoring rule for my work-in-progress uncertainty test[1] of calibration, but only Beta posteriors, which are also a neat way of presenting the result of many predictions.
I am slightly more fond of log scoring than the Brier score, though, for the reason mentioned in another comment: being somewhat wrong is often worse than being very right, and should be penalised harder numerically.
(By the way, I build this to practise myself -- but I ran into a problem: I know the answers to all propositions, having written them myself... if anyone wants to contribute propositions, please contact me and I'll ask for them in a specific format so I can blindly paste them without knowing the true ones.)
No difference this is exactly same as brier score. MSE is the KL divergence between ground truth and true prediction, assuming a gaussian error distribution. We use MSE as loss because we try to minimize KL divergence (again assuming gaussian error distribution). The article is very shallow, I am surprise it comes on HN front page.
I agree that the post lacks depth, but it was intended to be a gentle article accessible to a general audience, so they can start applying it in practice in their day to day lives. I would, however, really love to hear your views on what might be a more rigorous treatment of similar topics that can be introduced in an accessible way - would you be able to drop me a line at datarecipes@pm.me? Thanks!
Great point about KL divergence and assumptions about error distribution. This kind of thing is what I think is missing from a lot of data science education.
Agreed. It would be great to hear your views on some of the key gaps in modern data science curricula that could be covered in the blog - would you be able to drop me a line at datarecipes@pm.me? Thanks!