No difference this is exactly same as brier score. MSE is the KL divergence between ground truth and true prediction, assuming a gaussian error distribution. We use MSE as loss because we try to minimize KL divergence (again assuming gaussian error distribution). The article is very shallow, I am surprise it comes on HN front page.
I agree that the post lacks depth, but it was intended to be a gentle article accessible to a general audience, so they can start applying it in practice in their day to day lives. I would, however, really love to hear your views on what might be a more rigorous treatment of similar topics that can be introduced in an accessible way - would you be able to drop me a line at datarecipes@pm.me? Thanks!
Great point about KL divergence and assumptions about error distribution. This kind of thing is what I think is missing from a lot of data science education.
Agreed. It would be great to hear your views on some of the key gaps in modern data science curricula that could be covered in the blog - would you be able to drop me a line at datarecipes@pm.me? Thanks!