Hacker News new | ask | show | jobs
by bunderbunder 2785 days ago
Logistic regression isn't sexy, but it can still achieve near state-of-the-art results, is reasonably resistant to bias^H^H^H^H variance, and generates parameters that you can easily explain to someone with no background in math.

There's a lot of value in all that. Especially if your deliverable is something that a business is going to use, and not just a Kaggle entry.

5 comments

> generates parameters that you can easily explain to someone with no background in math

I know it _seems_ that way, but there's a surprising amount of nuance there and I think we're both fooling and limiting ourselves by letting this idea fester.

For one, unlike linear regression, logistic regression estimates aren't collapsible, so you can NOT interpret them as "changing this input by X changes the output by Y". That's only true if your set of covariates is perfect, which is never true, though in practice this interpretation might not be _that_ far off.

Another issue I see is practitioners not being aware of scaled/unscaled estimates; I've seen real papers from AI groups use logistic regression estimates like feature importance rankings, but using estimates in the scale of the original features, and not understanding the distinction when confronted about it.

From a practical sense, I think practitioners are much better served using random forests as their initial exploratory models. Less effort for results that are in practice at least as good as a well-prepped logit. Plenty issues with feature importance there, but not any worse than with logistic regression.

I don't think that's such a big deal in practice. See http://jakewestfall.org/blog/index.php/2018/03/12/logistic-r..., for example.

tl;dr: The upshot is that non-collapsibility means that I can't use LR coefficients for things that I don't really need to use them for, anyway. That doesn't feel like a crippling limitation to me.

(Well, also, I have to occasionally pause to cross my fingers and say, "ceteris paribus," under my breath, which does admittedly make people think I'm some sort of weird Harry Potter nut. Which is OK. They're not wrong, they're just right for the wrong reason.)

Nor does it render its coefficients less interpretable than those of most other models. "Less interpretable than OLS" can still be pretty darn interpretable.

I had exactly that post in mind, it really raised my awareness of these issues.

I agree with Jake's interpretation of the conditional interpretation of the estimates, but the practical issue is that virtually nobody not well-educated in statistics will do that correctly. In particular, people tend to do exactly what Jake concedes rarely makes any sense, which is comparing estimates across different model specifications.

You and I might interpret these betas just fine, but if we show them to a less stats-y audience, will they?

I guess it depends. I have the luxury of working in a very "this is machine learning, which is not to be confused with statistical inference" problem domain. It doesn't really even really make sense to interpret most the models I build as describing any sort of causal relationship, and when people are looking at the parameter estimates, they're really just trying to figure out, "What does this model think is important?"
That sounds nice!

Feature ranking seems like a clearly safe interpretation of betas, though I've been bitten too often by letting glm (in R) scale my predictors, giving me back estimates on the original scales, and thus incomparable, and seen it happen to others even more. Easy to miss when your original scales aren't all that different.

It's not that difficult to compute true marginal effects from logistic regression using something like the bootstrap (if you have a distribution for your coefficients) or explicit differentiation. Every traditional stats app (Stata, SAS, etc) has this.
What do you mean by "true" marginal effect? Are you suggesting a post-hoc procedure can correct estimates such that they are close enough to the estimate that would have been produced with a more complete model specification?
Logistic regression isn't sexy, but it can still achieve near state-of-the-art results, is reasonably resistant to bias^H^H^H^H variance, and generates parameters that you can easily explain to someone with no background in math.

As far as I can tell, the GP raised no objection to logistic regression, they simply noted that the illustration didn't actually illustrate logistic regression but something else.

Off topic-ish, but what do you mean by ^H^H^H^H? I feel like it's a joke I don't get.
Yeah, it is a good first benchmark. But view interpretability as separate from accuracy. You can explain black box algorithms just fine these days.

Logistic regression is high bias low variance. If you were talking about fairness bias, then resistance to bias comes from logreg being too dumb to recognize complex non-linear patterns. Not necessarily a pro.

Sorry, I misspoke - will edit. Was talking about resistance to overfitting. Which largely comes from logistic regression's assumption of a linear decision boundary. It's true surprisingly often in classification tasks, and, when it's not, you can usually model it just fine with interaction variables.

With an ANN, your easiest defense against overfitting is to have great big heaping piles of training data. That's something that's hard to come by in many interesting situations.

Agreed. Logistic regression with poly kernel or good engineering interactions can equal or beat more complex models for a fraction of the budget.

All the more power to you if a solid simple logreg model (or even no ML at all) is your first deliverable.

Would you mind talking to how black box interpretability is becoming well known? I've seen Shapley values used for feature interpretation, but not sure what else is being done.
For an accessible recent overview see: https://christophm.github.io/interpretable-ml-book/
Logistic regression is almost interpretable. It certainly looks interpretable, and it certainly good enough if you’re just trying to make PM feel better or hang some explanatory chrome in a UI, but it’s not truly interpretable.

They maybe directionally interpretable, but that’s about it.

Knowing there is a a probabilistic relationship expressed by the coefficients and saying that “do x and y will happen” isn’t the same thing.
Logistic regression essentially gives a conditional probability function, much like linear regression gives a conditional expectation function. You can compute log odds from logistic regression -- say, conditional to all other factors being left-handed makes you twice as likely to some binary effect. People were complaining that this isn't trivially done by staring at the coefficients, but people who can't think in partial derivatives shouldn't be in this business.

OTOH: if you assume an iid framework, the probabilistic marginal effects aren't even needed to go from something like "non-bottle blondes have probability p of being haired, bottle blondes have probability q" to "painting the hair of 1000 women will generate 1000*(q-p) jobs on average". Or you can parameterize a Poisson process for rare events and report exponential/Erlang waiting times. And so on.

That's the problem. Log odds are not intuitive. Hell, even probabilities aren't intuitive, and that's much easier to think about. Look at all the people start crying that "it was wrong" when the less likely event happens when the prediction said 90% probability.

This isn't a crazy minority position "logistic regression is not interpretable" is truism from basic ML courses, and blog posts all over the internet.

It's odd that "ML theory" (really data science theory) as proposed by blog posts would supersede established statistics.

Something is rotten in the kingdom of Denmark.