| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by knowledgesale 5353 days ago

The p-value [1] criterion is often used to test a proposed hypothesis in medical and social science research papers. When you stumble upon something along the lines of "studies that shown that eating broccoli makes people happy" in your everyday life, it comes down to the p-value calculation being small enough <0.05 for the before and after gathered datasets.

The p-value method is practically a standard for scientific reporting in some fields. It also has some drastic shortcomings, including, e,g., dramatic instability for tests with only little data [2].

Naturally, people realize that and try to use additional tools and criteria when available. However, scientists are pretty brutally incentivized to publish positive results and, as a result, more often than it should be, too much weight is put on the single p-value criterion.

With issues like this in mind, in my opinion, it makes sense to be somewhat skeptical when seeing reports in the news that "A effects B" and definitely not to rush with the conclusions. Trivial, I know.

The [3] video pretty much sums it up and by all means is worth a watch.

-------------------

[1] http://en.wikipedia.org/wiki/P-value

[2] http://en.wikipedia.org/wiki/P-value#Problems

[3] http://www.youtube.com/watch?feature=player_embedded&v=e...

2 comments

tokenadult 5353 days ago

This is an important issue. The article "Warning Signs in Experimental Design and Interpretation"

http://norvig.com/experiment-design.html

by Peter Norvig, director of research at Google, has good follow-up on what we CANNOT assume just because a study has a finding that has met the p-value criterion.

link

polyfractal 5353 days ago

To take it a step further, the <0.05 rule is basically an arbitrary assignment. There has been a (rather unsuccessful) push to publish complete p-values, rather than ones just under 0.05. It is then up to the reader to decide what they deem significant.

There are many cases where data is 0.06 or 0.055 but doesn't get a shiny little star.

And then there are cases where scientists just straight up don't understand stats (normality, sample size, what you can compare against, etc). A recent story highlighted that[1].

[1] http://news.ycombinator.com/item?id=3285742

link