Hacker News new | ask | show | jobs
by itissid 408 days ago
When you frame it as an optimization problem, like by optimizing the squares loss or cross entropy, you have decided that your data generating process(DGP), i.e. Y is:

- A Binomial/Multinomial random variable, which gives you the the cross entropy like loss function.

- Is a Normal random variable, which gives you the squared loss.

This point is where many ML text books skip to directly. Its not wrong to do this, but this is a much more narrow intuition of how regression works!

But there is no reason Y needs to follow those two DGPs (The process could be a poisson or a mean reverting process)! There is no reason to believe prima-facie and apriori that the Y|X is following those assumptions. This also gives motivation for using other kinds of models.

Its why you test weather those statistical assumptions carefully first using a bit of EDA and from it comes some appreciation and understanding of how linear regression actually works.