| HN Mirror

D can be binary, in which case the model predicts two alternative outcomes for the log-odds. The linear part is shifted by beta, whenever D=1.

As for your second point, there is no prior reason in this case why a linear function of D would be a good approximation. Indeed in the current case, we would probably prefer to at least write beta1D+beta2D^2+beta0 which is still linear in a transformation of D.

That being said, however, there is a notion on why a linear function may be a good approach. If you are interested in the direction of change around the averages values of the variables involved, then a linear function gives you such a "linear approximation" of the slope. This of course quickly breaks down if the function is not really linear, and in particular, it breaks down if you are interested in predicting an observation that is not "average". But often, one may be interested in such very qualitative statements as: on average, the coffee is improved by fresher beans - yes or no? In that case, such a linear model may give an answer.

Note that the above is absolutely not formally correct.

Finally, logistic regression can also be motivated differently.

- It arises from minimizing certain entropy losses in Machine Learning

- One may assume that the binary variable we observe is really just based on a "latent" variable (here something like coffee quality), which is determined by such a linear model

- finally, in economics and reinforcement learning, we assume agents make one decision (here whether the coffee is good or bad) by judging the inputs plus some random "error" or "taste" parameter which is has extrem value distribution. Since only the differences between these utilities matter (cf. odd ratios), and the actual values are meaningless, the logistic regression also arises.