Hacker News new | ask | show | jobs
by thaumasiotes 1738 days ago
> Personally, I suspect it is not an accidental outlier, but given that it does not produce much distortion in the overall trend, I am less inclined to see the 0.05 threshold (actual or perceived) as a problem than I did before I saw this chart.

Don't be fooled by the line someone drew on the chart. There's no particular reason to view this as a smooth nonlinear relationship except that somebody clearly wanted you to do that when they prepared the chart.

I could describe the same data, with different graphical aids, as:

- uniform distribution ("75 papers") between an eyeballed p < .02 and p < .05

- large spike ("95 papers") at exactly p = 0.4999

- sharp decline between p < .05 and p < .06

- uniform distribution ("19 papers") from p < .06 to p < .10

- bizarre, elevated sawtooth distribution between p < .01 and p < .02

And if I describe it that way, the spike at .05 is having exactly the effect you'd expect, drawing papers away from their rightful place somewhere above .05. If the p-value chart were a histogram like all the others instead of a scatterplot with a misleading visual aid, it would look pretty similar to the other charts.

1 comments

Well, you could extend this mode of analysis to its conclusion, for each dataset, and describe each datum in the data by its difference from its predecessor and successor, but if you do, does that help? I took it as significant that you wrote "...but it's an outlier from what is otherwise a regular pattern that clearly shows that smaller p-values are more likely to occur than larger ones are" (my emphasis) and that is what I am responding to.

I think we are both, in our own ways, making the point that there is more going on here than the spike just below 0.05 - namely, the regular pattern that you identified in your original post. If we differ, it seems to be because I think it is explicable.

WRT p-values of 0.05: I almost, but did not, say that if you curve-fitted above and below 0.05 independently, there would be a gap between the two, and maybe even if you left out the value immediately below 0.05. No doubt that would also happen for other values, but I am guessing that this gap would peak at 0.05. If I have time in the near future, I may try it. If you do, and find that I am wrong, I will be happy to recant.