Hacker News new | ask | show | jobs
by mjw 3682 days ago
Their answer is pretty much 'because it's based on the log-odds', which to me is still only very mild motivation.

There are other non-linearities which people use to map onto (0, 1), for example probit regression uses the Normal CDF. In fact you can use the CDF of any distribution supported on the whole real line, and the sigmoid is an example of this -- it's the CDF of a standard logistic distribution [1].

There's a nice interpretation for this using an extra latent variable: for probit regression, you take your linear predictor, add a standard normal noise term, and the response is determined by the sign of the result. For logistic regression, same thing except make it a standard logistic instead.

This then extends nicely to ordinal regression too.

[0] https://en.wikipedia.org/wiki/Probit_model [1] https://en.wikipedia.org/wiki/Logistic_distribution

2 comments

There are other nice properties. For example, because the logit link is canonical for the binomial GLM, inference about unknown parameters using it is based on sufficient statistics.

It's certainly not the only option though, and not always the best fit.

Ah yep, I forgot it's the canonical link. That's more of a small computational convenience though, right, at least when fitting a straightforward GLM -- it should be very cheap to fit regardless.

I suppose the logistic having heavier tails than the normal is probably the main consideration in motivating one or the other as the better model for a given situation.

Logistic being is heavier-tailed, is potentially more robust to outliers. Which in terms of binary data, means that it might be a better choice in cases where an unexpected outcome is possible even in the most clear-cut cases. Probit regression with its heavier normal tails, might be a better fit in cases where the response is expected to be pretty much deterministic in clear-cut cases, and where quite severe inferences can be drawn from unexpected outcomes in those cases. Sound fair?

Is there a natural justification for the logistic distribution though?
See the other replies above, but: the logistic has heavier tails than the normal, so might do better in cases where we need robustness, where unexpected outcomes remain possible even in cases where the linear predictor is relatively big, and we want to avoid drawing extreme inferences from them.

Probit might lead to more efficient inferences in cases where the mechanism is known to become deterministic relatively quickly as the linear predictor gets big.

You could go further in either direction too (more or less robust) by using other link functions.