|
|
|
|
|
by hervature
1549 days ago
|
|
But you can literally see the spread of the data points. I think you are complaining about nothing and maybe not understanding the article. They are showing plots with low R^2 and claiming these variables in a vacuum do not predict homelessness (which is in agreement with what you are saying). Then they present two plots, having much higher (but still low) R^2 and make the innocuous claim that median rent is the single greatest predictor that they found. Of course, this analysis is very simple and any model trying to explain homelessness should contain many variables including things like the climate of the city. But to complain that the confidence interval of the slope is uninterpretable is silly. Any data scientist worth their salt understands this is simply a visual representation of the confidence interval outputted by the regression. |
|
Random looking balls of data points don't have slopes. It is invalid to perform a linear fit on data that does not derive in large part from a linear generative process. And presenting a fit from a model that is facially absurd to apply is bad data science. Whether or not an informed reader would discount the absurd model fit is not material to whether it is appropriate to present such a fit.
They could have binned the data and plotted percentile bands. They could have used a non-parametric density estimator. There are lots of things they could have done to summarize the data and make some sense of the ball of points. But linear regression with slope error bars is not an appropriate choice. That it is easy to compute linear fits, and that it helped them make their point is not justification.