|
|
|
|
|
by itissid
408 days ago
|
|
When you frame it as an optimization problem, like by optimizing the squares loss or cross entropy, you have decided that your data generating process(DGP), i.e. Y is: - A Binomial/Multinomial random variable, which gives you the the cross entropy like loss function. - Is a Normal random variable, which gives you the squared loss. This point is where many ML text books skip to directly. Its not wrong to do this, but this is a much more narrow intuition of how regression works! But there is no reason Y needs to follow those two DGPs (The process could be a poisson or a mean reverting process)!
There is no reason to believe prima-facie and apriori that the Y|X is following those assumptions. This also gives motivation for using other kinds of models. Its why you test weather those statistical assumptions carefully first using a bit of EDA and from it comes some appreciation and understanding of how linear regression actually works. |
|