|
|
|
|
|
by mdbco
4129 days ago
|
|
"Statistical modeling is a lot like engineering." I can certainly see why this is a good comparison, because it's true that both engineering methods and statistical methods rely on sets of given assumptions, but it's also really important not to take this analogy too far. Engineering is ultimately something that is done in a mechanistic world with primarily deterministic outcomes, whereas statistical modeling is conducted in a stochastic world with probabilistic outcomes, so it wouldn't be good to think about machine learning as predominantly mechanistic in nature (in spite of its name). Of course, a lot of the seven points that follow in the post actually emphasize the importance of stochastic factors (e.g. outliers, variance issues, collinearity, etc), so the author is clearly not making this mistake, but it might be good to clarify for anyone else who is reading. "6. Use linear model without considering multi-collinear predictors" This is a great point, and just to expand on it a bit, you can also have situations where you have simultaneity, i.e. two or more of your features or predictors are either functions of each other and/or functions of some third variable. This type of problem is more difficult to detect but can cause serious problems with interpreting the regression coefficients as it's ultimately a type of endogeneity, which means that common approaches like OLS will not be consistent. |
|