Making better decisions with the Brier score

Y	Hacker News new \| ask \| show \| jobs

	Making better decisions with the Brier score (datarecipes.io)
	50 points by ergodiclife 1839 days ago

5 comments

astrophysician 1839 days ago

In my experience I’ve never found an instance where you would use Brier scores over cross entropy/Bernoulli/Binomial log likelihoods. Does anybody know a concrete example when you would prefer Brier??

link

datarecipes 1838 days ago

Both the Brier score and log loss are proper scoring rules (i.e. optimized when the predicted probabilities are the true outcome probabilities), and the choice between the two seems to have minimal impact on the conclusions that can be drawn (https://pubsonline.informs.org/doi/abs/10.1287/deca.2013.028...). I covered the Brier score in the post as I thought it would be easier to digest for a general audience.

As Frank Harrell wrote on his blog (https://www.fharrell.com/post/class-damage/), one advantage of the Brier score could be its interpretability and the ability to break it decompose it into discrimination and calibration components.

link

srean 1838 days ago

Indeed. Note though that proper scoring rules form a large class and it can matter which one you choose.

For example, for logistic regression, things become a lot simpler if one chooses log loss (equivalently KL divergence) because one ends up with a convex minimization problem. Had one chosen Brier score here the problem is no longer convex and where one starts the training iteration will determine where the updates converge to. Sometimes this indeterminacy is a problem -- am getting poor results, is it because the data has changed, or is it that my initial seed has changed and the udates have converged to a worse solution.

link

tlb 1839 days ago

It's appropriate when the cost of false-positive and false-negative errors are the same. Which isn't common in the real world.

link

kqr 1839 days ago

More generally, if one views probability as separate from the utility of the outcome it's attached to, one is bound to make bad decisions.

Real decision problems contain a lot of nonlinearities if decomposed the wrong way. The only way to decompose it is as a linear combination of probability and utility (because the utility swallows the nonlinearities). But for each component both probability and utility matters in determining the overall value of the decision.

link

closed 1839 days ago

The article mentions Brier score is just mean squared error, so it's connected to binomial through that (e.g. where correct prediction is 1, incorrect is 0, it is the mean of the binomial).

link

gcuth 1839 days ago

For folks who want to try the kind of forecasting being discussed here, Metaculus is a pretty great community: https://www.metaculus.com/

Their FAQ has a great explanation of how they 'score' user forecasts --- including a summary of Brier scores for binary yes/no questions, and the log score used for both binary and continuous questions: https://www.metaculus.com/help/faq/#howscore

link

ergodiclife 1838 days ago

This looks really interesting - thanks!

link

kqr 1839 days ago

I'm not (yet) using a scoring rule for my work-in-progress uncertainty test[1] of calibration, but only Beta posteriors, which are also a neat way of presenting the result of many predictions.

I am slightly more fond of log scoring than the Brier score, though, for the reason mentioned in another comment: being somewhat wrong is often worse than being very right, and should be penalised harder numerically.

[1]: https://static.loop54.com/uncertainty-test.html

(By the way, I build this to practise myself -- but I ran into a problem: I know the answers to all propositions, having written them myself... if anyone wants to contribute propositions, please contact me and I'll ask for them in a specific format so I can blindly paste them without knowing the true ones.)

link

datarecipes 1838 days ago

This looks quite cool - would you be able to drop me a line at datarecipes@pm.me?

link

qinjian623 1839 days ago

What's the difference with MSE loss?

link

master_yoda_1 1839 days ago

No difference this is exactly same as brier score. MSE is the KL divergence between ground truth and true prediction, assuming a gaussian error distribution. We use MSE as loss because we try to minimize KL divergence (again assuming gaussian error distribution). The article is very shallow, I am surprise it comes on HN front page.

link

datarecipes 1838 days ago

Thanks for your comment.

I agree that the post lacks depth, but it was intended to be a gentle article accessible to a general audience, so they can start applying it in practice in their day to day lives. I would, however, really love to hear your views on what might be a more rigorous treatment of similar topics that can be introduced in an accessible way - would you be able to drop me a line at datarecipes@pm.me? Thanks!

link

nerdponx 1838 days ago

Great point about KL divergence and assumptions about error distribution. This kind of thing is what I think is missing from a lot of data science education.

link

datarecipes 1838 days ago

Agreed. It would be great to hear your views on some of the key gaps in modern data science curricula that could be covered in the blog - would you be able to drop me a line at datarecipes@pm.me? Thanks!

link

nerdponx 1839 days ago

Brier score is MSE applied to predicted probabilities. One is a loss function for "regression" type problems, the other is a proper scoring rule.

link

KingOfCoders 1839 days ago

Anyone else reading Bier score?

link