| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hervature 1596 days ago
	But you can literally see the spread of the data points. I think you are complaining about nothing and maybe not understanding the article. They are showing plots with low R^2 and claiming these variables in a vacuum do not predict homelessness (which is in agreement with what you are saying). Then they present two plots, having much higher (but still low) R^2 and make the innocuous claim that median rent is the single greatest predictor that they found. Of course, this analysis is very simple and any model trying to explain homelessness should contain many variables including things like the climate of the city. But to complain that the confidence interval of the slope is uninterpretable is silly. Any data scientist worth their salt understands this is simply a visual representation of the confidence interval outputted by the regression.

1 comments

jrd79 1596 days ago

You are avoiding the question of whether it is appropriate to present the results of a linear regression on data that is so poorly explained by a linear relationship.

Random looking balls of data points don't have slopes. It is invalid to perform a linear fit on data that does not derive in large part from a linear generative process. And presenting a fit from a model that is facially absurd to apply is bad data science. Whether or not an informed reader would discount the absurd model fit is not material to whether it is appropriate to present such a fit.

They could have binned the data and plotted percentile bands. They could have used a non-parametric density estimator. There are lots of things they could have done to summarize the data and make some sense of the ball of points. But linear regression with slope error bars is not an appropriate choice. That it is easy to compute linear fits, and that it helped them make their point is not justification.

hervature 1596 days ago

> linear regression on data that is so poorly explained by a linear relationship.

That is exactly what they are saying. This is from TFA:

> The graphics above demonstrate that variation in rates of homelessness cannot be explained by variation in rates of individual factors such as poverty and mental illness.

They are, in my words, saying "Look at this plot, the x-axis has no bearing on the y-axis. To give you a sense of how bad it is, we fit a line to it and it is exactly 0 useful." I don't know why you are focusing so hard on the plots without reading their words. You are in agreement with TFA. Now, for the plot with R^2 of 0.55, that clearly has some positive relationship to it.

As for your last paragraph, I disagree 100%. They are trying to find an explanatory variable, not "summarize the data". By showing all the points, it is evident there is no relationship. As you have continuously pointed this out, the plot achieved its goal. In my opinion, the line is a nice touch for statisticians to know that no illusions from scaling of the axes are playing tricks.