| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by khr 2175 days ago
	Agreed. I was curious enough to run the model myself so I used a tool to extract the data. The slope estimate (b=17.24) is not significantly different from zero, p=.437. The data are here: https://pastebin.com/HhWTKZRb

4 comments

bluenose69 2175 days ago

In case anyone is interested, below is R code to read these data and compute the regression. The summary() reveals the p value for the slope to be 0.437, and that for the intercept to be 0.32.

    d <- read.table("https://pastebin.com/raw/HhWTKZRb", header=TRUE)
    m <- lm(cumulative_covid19_per100000~proportion_binge_drinkers, data=d)
    summary(m)

link

SubiculumCode 2175 days ago

The problem is that the author is essentially claiming that running the regression for data not passing his eyeball test is, in itself, a misuse of regression...which is nonsense.

link

gleenn 2175 days ago

I'm not sure I understand your point. Did you actually look at the regression line through the data? It looks crazy off. I'm not a statistician but that line looks like it doesn't represent that data very well at all. People area also saying nuanced comments above but the underlying fact seems to be that this is not a good use of linear regression, and there is no strong correlation between the two axes.

link

SubiculumCode 2175 days ago

Without access to the residuals, I'd still venture to guess that the assumptions of the regression are not severely violated in this data set.

When this regression is conducted, the null hypothesis is not rejected (regression slope not significantly different than zero). If someone is somehow arguing this regression rejects the null hypothesis, then they would be incorrect. But there is nothing wrong with using regression here. Its kind of the whole point. This is basic regression statistics 101.

Error bands on the regression slope would help people understand the uncertainty of the apparent slope.

link

gowld 2174 days ago

Are you saying that eyeball tests are wrong? That's an extreme claim.

link

spekcular 2174 days ago

Eyeball tests are often misleading, or fail to detect weak correlations (or deviations from model assumptions such as heteroskedasticity). That's why we check with more formal methods.

link

ivansavz 2174 days ago

> I used a tool to extract the data

You mean you have a tool for extracting tabula data from a scatter plot like http://www.goodmath.org/blog/wp-content/uploads/2020/07/EcCq... ? That's very cool and I would love to hear more about it.

link

xioxox 2174 days ago

There are quite a lot of these. Here is one I like: https://automeris.io/WebPlotDigitizer/

link

gowld 2174 days ago

What are some examples of data sets with high(ish) r with high p (low confidence), and low p (high confidence) with low r?

I guess it would be a very tall, "sharp cornered" parallelogram of data points (clear slope at the average, but high error variation), vs a very short, wide rectangle?

That would be a cool explorable demo.

link