Hacker News new | ask | show | jobs
by doomrobo 2557 days ago
This was a really neat exposition! I have a few questions:

1. Is D a binary random variable? If so, what exactly does it mean to say beta*D + beta_0 is an approximation for log odds? Doesn't this formula only take on 2 possible values?

2. Could you provide intuition for why a linear function of D would be a good approximation for the log odds mentioned?

2 comments

D can be binary, in which case the model predicts two alternative outcomes for the log-odds. The linear part is shifted by beta, whenever D=1.

As for your second point, there is no prior reason in this case why a linear function of D would be a good approximation. Indeed in the current case, we would probably prefer to at least write beta1D+beta2D^2+beta0 which is still linear in a transformation of D.

That being said, however, there is a notion on why a linear function may be a good approach. If you are interested in the direction of change around the averages values of the variables involved, then a linear function gives you such a "linear approximation" of the slope. This of course quickly breaks down if the function is not really linear, and in particular, it breaks down if you are interested in predicting an observation that is not "average". But often, one may be interested in such very qualitative statements as: on average, the coffee is improved by fresher beans - yes or no? In that case, such a linear model may give an answer.

Note that the above is absolutely not formally correct.

Finally, logistic regression can also be motivated differently.

- It arises from minimizing certain entropy losses in Machine Learning

- One may assume that the binary variable we observe is really just based on a "latent" variable (here something like coffee quality), which is determined by such a linear model

- finally, in economics and reinforcement learning, we assume agents make one decision (here whether the coffee is good or bad) by judging the inputs plus some random "error" or "taste" parameter which is has extrem value distribution. Since only the differences between these utilities matter (cf. odd ratios), and the actual values are meaningless, the logistic regression also arises.

(Not the author.)

D is a vector of input data. If it's more than a single number, then beta D needs to be interpreted as a dot product.

There's no guarantee that in any specific case a linear function of D will be a good approximation to the log odds. (In the present instance, where D is the temperature, it won't be -- there'll be a narrow range of good temperatures and the further away you get from that range, the worse the coffee is likely to be.)

But a linear approximation is at least simple and log odds (unlike e.g. probability) at least can take any value from -oo to +oo. Sometimes you get lucky.