Hacker News new | ask | show | jobs
by pocketsand 891 days ago
I'll start by saying that if you're going to roast a paper that an econ nobel winner and one of the most famous and respected working statisticians put their names on, you probably want to turn down the volume and double check your claims a little more before hitting "post."

A z score is not at all morally equivalent to a p-value. It's just a standardized measure. Converting measures to z-scores aids in interpretation. They also can aid estimation in some cases: using non-standard parameterization in Bayesian analysis is often crucial to get MCMC to accurately sample from the posterior distribution.

Sure, you can take a z score and look at the area under the curve and come up with a p value. But you don't have to. In the referenced paper, they use z scores to be able to standardize the measures in the papers they draw from, so they're comparable.

The author's other critiques of the paper seem reasonable. It's a problem with all meta analyses: the amount of work it takes to correctly interpret publishes papers and then take those results and aggregate them is herculean. To do it to over 20,000 is inevitably going to lead to some mistakes. That said, those mistakes may not be fatal to the analysis.

Moreover, saying "no one knows what happened to those 11,285 studies" without checking in with the authors is completely unfair. The first author responded with the code showing exactly how they achieved that figure. Nothing mysterious.

Andrew Gelman responded in the comments to him, as did the first author. I find their responses convincing.

5 comments

> Moreover, saying "no one knows what happened to those 11,285 studies" without checking in with the authors is completely unfair. The first author responded with the code showing exactly how they achieved that figure. Nothing mysterious.

But that is the whole point! The methodology of how they dropped the 11,285 studies was not even told in the original paper, and even in the comments the author doesn't explain "why", just "how". Hence, I think it's completely fair to call it "irreproducible".

The point of doing "reproducible science" is not that I write a paper, you email me asking how did I come up with my number, and I email you back an explanation. No! The important details should be in the paper already. You may do some magic on your dataset, and that's fair _as long as you detail what magic you did and why_, and given you can defend that practice in front of your peers. Otherwise what is the point of preaching "reproducible science" at all?

> The point of doing "reproducible science" is not that I write a paper, you email me asking how did I come up with my number, and I email you back an explanation. No! The important details should be in the paper already.

This is how it works, people email each other all the time. Why shouldn’t you? You can’t imagine every bit of information that someone would want, and papers have page length limits so you make choices about what to cut.

Don't you think that gives the people who are falsifying results rather an easy time?

If Bob falsifies data and someone e-mails him, asking him to send them rope to hang him with, he can simply delete the e-mail.

Or claim he forgot the details of the analysis. Or claim it was handled by a grad student who left. Or claim the info was lost when a hard drive broke. Or claim the data was the intellectual property of College A and he can't access it now he's at College B. Or claim privacy or copyright rules cover the key data. Or that they don't have a license to the software that can open the data files any more. Or any of a dozen other things.

Any of those responses (or non-response) by Bob would justify the asker in their skepticism of the falsified results. Giving somebody a chance to respond is not the same as relaxing your standards of evidence, it's just an acknowledgement that there might be explanations you haven't thought of and giving an opportunity for those to be brought up.
All anyone can do is try to use that research in their own work, and see if their work supports findings from prior work. Sometimes it doesn’t, I am not sure if that means someone lied on purpose. It’s possible they were bad at interpreting the results, or they made bad assumptions. I think poor research standards are the main reason for the reproducibility crises, and not people lying on purpose.

Typically bad research assumptions or implementations are rooted out during peer review, but it’s an imperfect process.

I do think there needs to be a dedicated non-profit and neutral organization solely responsible for reproducing scientific results for all fields, and assigning a reproducibility score to research finding. This could become an entire field by itself, and would have its own complications, but the reproducibility crises does exist and needs a solution.

A paper should have enough information for an independent researcher to reproduce the results.

Otherwise, years later, if the author dies, the paper would be basically worthless for reproduction.

(We're still talking about reproducibility crisis, right?)

I think your critiques are fair enough and I think the authors would likely agree they should have been more proactive with the data sharing.

Maybe it sounds like I'm parsing too much, but I nevertheless still think saying "no one knows what happened" is unfair. They know, they shared, and justified what they did when called on it with literally almost no delay. I agree they shouldn't need to be called on it.

Anyone who's advised students or asked even presenting researchers such questions know that often people will literally not know what happened to all their data.

> Anyone who's advised students or asked even presenting researchers such questions know that often people will literally not know what happened to all their data.

I am sorry that’s been your experience, maybe it varies by field and quality of research? Most people I’ve questioned have provided reasonable answers to their findings. I don’t understand why anything needs to be assumed in bad faith or shoddily done. It’s all a bit Dunning-Kruger to me where everyone assumes that everyone else is doing shoddy or bad work.

Things are complicated.

To be fair: Everyone I've worked closely with in research has gone above and beyond not to cut corners and produce high quality data and research.

What I have in mind here is a situation where people are actually quite careful but can still end up in a place where they don't know what happened because they don't have good systems for creating datasets and storing code.

For example, graduate students are not always taught to work in a reproducible way. It's definitely gotten better from what I can see, but it was normal for people to get source data and work that data into its final form in a lot of different steps, but not always reproducible steps. E.g., data comes in from secondary source or other provider. It gets cleaned. That file gets saved as something like "clean data 011234.csv".

More work is done, it gets saved again.

Time passes, things are revisited, and a handful of files exist that likely with some care could lead from point A to point B. But the exact process, to say nothing of the dozens, sometimes hundreds, of small decisions data preparation decisions get lost to memory.

Code doesn't go in version control. People get new computers. USBs get lost. Universities migrate to new data systems and so on.

All the while, these students and researchers were very careful while doing the work. They were just never trained to use good version control and pipeline processes. They basically do what they did with papers they write. Save and backup while working through the paper and move on when it's done.

This is made worse when data is proprietary or not legally shareable.

So people aren't necessarily being shoddy or doing bad work, they're just not using good systems.

> So people aren't necessarily being shoddy or doing bad work, they're just not using good systems.

Agreed. I think there isn’t an incentive to do this because reproducibility takes a back seat to so many other concerns. Unless PIs are told that their publication chances depend on reproducibility, this isn’t going to change.

> It’s all a bit Dunning-Kruger to me

Fantastic article on the reproducability of Dunning-Kruger effect: https://replicationindex.com/2020/09/13/the-dunning-kruger-e...

Also see: "The Dunning-Kruger Effect is Autocorrelation" https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...

Lovely quote:

  “These responses to our work have also furnished us moments of delicious irony, in that each critique makes the basic claim that our account of the data displays an incompetence that we somehow were ignorant of.” (Dunning, 2011, p. 247).
Of course Dunning-Kruger is self-referential. Any mention of Dunning-Kruger automatically makes you a victim on wrong side of the graph.
I think the term is still useful for giving words to a phenomenon people experience. I guess a less charitable and more presumptuous term of such behavior would be calling the person displaying it narcissistic.
> [T]he author doesn't explain "why", just "how".

The below seems like a "why" to me.

> The criteria for selecting the data are an atttempt to get the primary efficacy outcome, and to ensure that each trial occurs only once in our dataset. The selection for |z|<20 is because such large z-values are extremely unlikely for trials that aim to test if the effect is zero.

Ah I am sorry, seems like a comment was posted later (Jan 9th) providing a justification of their criteria. However, the reply comment it was posted on wasn't there when I first read the article and the comments on Jan 8th.
Regarding your appeal to authority: high ranks in today’s scientific system is an incentive to defend the system.
I'm not sure it qualifies as argumentum ab auctoritate to say that a group comprised of Nobel prize winner's and renowned statistics experts are far more likely to be right about statistics than a solitary compsci professor.

Even if it does qualify, It's widely accepted that argument from authority is perfectly valid and often necessary when performing inductive reasoning.

I might even say that if you are not expert enough to judge the matter yourself (or willing to expend enough effort) it should be your default presumption that the more qualified speakers are correct, even if you have legitimate questions about their potential motivations.

My issue here is that Imbens and Gelman are respected because they are good statisticians but also clear thinkers who have proved themselves dedicated to doing good, careful work. If you have major issues with work their name is on, I would think their reputation would at least lead reasonable people to contact them first before writing something which leads to hurt feelings and numerous comments and corrections and more posts on each end.

This whole affair is a case in point. Several authors ended up responding with very reasonable responses, which the author then acknowledged. Then he writes another post, which is more measured and insightful.

The original post would have in fact been a far better one if the author had first sought those responses and just addressed them all together at once. His points about metascience would not then be dragged down by back and forth in the comments that clearly got personal.

Small quibble: converting a measure to a z-score requires an assumption of normality and that you're considering a population.

For a sample the equivalent is the t-statistic, which indeed IS very often used for p-values (with the ever popular t-test) and has a decently strict list of assumptions (which, like those of [O|W|G]LS are very frequently ignored).

z-scores just require knowing a standard deviation and a mean. you only need an assumption of normality if you want to do things like assign p-values. of course, that is mostly how they are used.
"You Come At the King,You Best Not Miss" ;-)
> I'll start by saying that if you're going to roast a paper that an econ nobel winner

"By 2005 or so, it will become clear that the Internet's impact on the economy has been no greater than the fax machine's." -- Paul Krugman ( Nobel Prize Winner in Economics )

If you have to pathetically appeal to authority right from the start, you probably have no argument worth considering.