Hacker News new | ask | show | jobs
by marbletimes 1622 days ago
When I was in academia, I used to fit highly sophisticated models (think many-parameters, multi-level non-linear mixed effect models) who were giving not only point estimate but also confidence and predictive intervals ("please explain to me the difference between the two" is one of my favorite interview questions and I still have not heard a correct answer).

When I tried to bring an "uncertainty mindset" over when I moved to industry, I found that (1) most DS/ML scientists use ML models that typically don't provide an easy way to estimate uncertainty intervals, (2) in the industry I was in (media) people who make decisions and use model prediction as one of the input for their decision-making are typically not very quantitative and an uncertainty interval, rather than give strength to their process, would confuse them more than anything else: they want a "more or less" estimate, more than a "more or less plus something more and something less" estimate. (3) When services are customer-facing (see ride-sharing) providing an uncertainty interval (your car will arrive between 9 and 15 minutes) would anchor the customer to the lower estimate (they do for the price of rides book in advance, and they need to do it, but they are often way off).

So for many ML applications, an uncertainty interval that nobody internally or externally would base their decision upon is just a nuisance.

5 comments

Great answer. It prompts a bunch of followup questions!

most DS/ML scientists use ML models that typically don't provide an easy way to estimate uncertainty intervals

Not an DS/ML scientist but a data engineer. The models I've used have been pretty much "slap it into XGBoost with k-fold CV, call it done" — an easy black box. Is there any model or approach you like to estimate uncertainty with similar ease?

I've seen uncertainty interval / quantile regression done using XGBoost, but it isn't out of the box. I've also been trying to learn some Bayesian modeling, but definitely don't feel handy enough to apply it to random problems needing quick answers at work.

Correct, quantile regression is an option. Another is "pure" bootstrapping (you can see by googling something like uncertainty + machine learning + bootstrapping that this is a very active area of current research).

The major problem with bootstrapping is the computational time for big models, since many models need to be fit to obtain a representative distribution of predictions.

Now, if you want more "rigorous" quantification of uncertainty, one option is to go Bayesian using probabilistic programming (PyMC, Stan, TMB), but computational time for large models can be prohibitive. Another option is to "scale down" the complexity to models that might be (on average) a bit less accurate, but provide rigorous uncertainty intervals and good interpretability of results, for example Generalized Additive Models.

A note here is that I saw certain quantification of uncertainty by people who were considered very capable in the ML community that gave me goosebumps, for example since the lower bound of the interval was a negative number and the response variable modeled could not be negative, the uncertainty interval was "cut" at zero (one easy way to deal with it, although it depends on the variable modeled and the model itself, is log-transforming the response—but pay attention to intervals when exp(log(y)) to get back to the natural scale. Another useful interview question.)

That is really an effect of CS rather than math people dominating ML both in applications and management. My background is in engimeering but always hire a percentage people witb math and business background. In reality there are very few ML applications that don't need confidence estimation and estimation of monetary costs. Else each company will end up having the equivalent of the google graveyard of useless applications. It really is not that hard.
I agree that statisticians would better than CS people appreciate the importance of uncertainty intervals--it is mostly cultural--but that "In reality there are very few ML applications that don't need confidence estimation and estimation of monetary costs" is empirically false.

If ML application require uncertainty attached to point estimate, we would see plenty more uncertainty intervals attached to point estimates, but in industry, outside of niches (e.g., banking, bio, actuary to name a few), very few bother dealing with them.

I am currently part of a large team (we are talking hundreds) of ML specialists, and I have yet to see a single presentation in which a point estimate was associate with some uncertainty interval. And in my previous company it was the same and when I interview candidates (dozens? hundreds?) I never get a satisfactory answer to the confidence interval vs predictive interval question I ask about.

Let me rephrase your empirical observation in probabilistic terms. If the a random sample of data scientists from startups had the same distribution of mathematicians and CS people than a ramdom sample of data scientists from banking then we could compare empirically whether confidence intervals are equally useful in both industries.

Given that historically regulators figured out that when playing with other people's assets you need to assess your confidence, the volatility of the outcomes in non banking industries that lack such oversight can be greatly attributed to people DunningKrugering after a couple of Andrew NG's courses.

That is my claim and based on my experience working in projects accross many industries accross many countries.

Although confidence and prediction intervals are slightly different, is there a an example where mistaking one for the other has led to real world consequences? I have a feeling it’s rare for it to matter.
They are not slightly different, they measure something (totally) different.

Confidence intervals are relative to E[y|x], Predictive intervals are relative to y. Sometimes, for example there is not much variation in y|x, the two intervals may be similar, but that is due to the nature of the data, not because they are one just "a bit larger than the other" (or, otherwise, think about (1) the uncertainty around the mean of an empirical symmetric distribution with a very small standard deviation--we are 95% confident the true mean is between z and k--and (2) the 2.5%-97.5% intervals of the raw data distribution. Numbers can look similar, but they are representing different measures).

I paste an example here below that I had made in a following comment:

--- In the vast majority of the cases, what we want it the range for y (prediction interval), that is, given x = 3, what is the expected distribution of y?. For example, say we train a model to estimate how the 100-m dash time varies with age. The uncertainty we want is, "at age 48, 90% of Master Athletes run the 100-m dash between 10.2 and 12.4 seconds" (here there would be another difference to point out between Frequentist and Bayesian intervals, but let's make things simple).

We are generally not interested in, given x = 3, what is the uncertainty of the expected value of y (that is, the confidence interval)? In this case, the uncertainty we get (we might want it, but often we do not), is, "at age 48, we are 90% confident that the expected time to complete the 100-m dash for Master Athletes is between 11.2 and 11.6 seconds".

----

The two intervals can be similar according to some metrics ("ah, come on, 11s or 12s who cares"), but they are measuring/estimating something very different and in many cases, they would matter a lot.

Why do I say they "would" and not they "do"? Because many, and the vast majority I'd say, of decisions in industry settings (outside some niches) that are taken even when ML or statistical models are included in the process, are using point estimate (so, not even uncertainty intervals) only as one of the many input in the decision-making process.

Let me give you an example. I was years ago developing models for estimating ROI relative to certain (very popular) products. The calculations made previously were absurdly wrong, there were log-transformations involved and guess what, they were using confidence intervals ("the uncertainty around the expected ROI for a similar class of products is") instead of predictive intervals ("the ROI for this class of products is expected to be between w and j").

I provided the correct intervals (i.e., predictive), but in the end the decisions changed little, because those making decisions they were not even considering uncertainty in any way in the decision-making process. That's why, in general, I don't worry too much about uncertainty on the rare occasions these days when I develop models.

I mean, who outside of academia (and even there...) measures the accuracy of a predictive model taking also into account the predictive intervals, for example adding to a metric like mean absolute error over test data also the proportion of test data that falls within the uncertainty intervals that were estimated for the model given the training data? The answer is "very few".

They measure something different, I agree, but not different enough to matter in real life decision-making process, which often involves factors outside of the model.

In real life decision-making, there are many other factors that are not known or quantifiable that come in and dominate any errors arising from using confidence instead of predictive interval.

What do "multi-level" and "mixed effects" mean? There are tons of non-linear models with lots of parameters, but I've never heard these other terms.
> the difference between the two

One is bigger than the other as far as I remember which means that the standard error of the prediction interval is bigger?

From a good SO answer, see https://stats.stackexchange.com/questions/16493/difference-b...

"A confidence interval gives a range for E[y∣x], as you say. A prediction interval gives a range for y itself.".

In the vast majority of the cases, what we want it the range for y (prediction interval), that is, given x = 3, what is the expected distribution of y?. For example, say we train a model to estimate how the 100-m dash time varies with age. The uncertainty we want is, "at age 48, 90% of Master Athletes run the 100-m dash between 10.2 and 12.4 seconds" (here there would be another difference to point out between Frequentist and Bayesian intervals, but let's make things simple).

We are generally not interested in, given x = 3, what is the uncertainty of the expected value of y (that is, the confidence interval)? In this case, the uncertainty we get (we might want it, but often we do not), is, "at age 48, we are 90% confident that the expected time to complete the 100-m dash for Master Athletes is between 11.2 and 11.6 seconds".