Hacker News new | ask | show | jobs
by foldr 2818 days ago
As you, say, reviewers won't (and most likely, can't) redo the experiment. For this reason, there is no real protection in the review process against people making results up.

If you didn't trust by default, you'd never publish anything.

1 comments

> For this reason, there is no real protection in the review process against people making results up.

You can still sanity-check the results, even without redoing the experiment. For example if the average agreement to a certain question on a 1-5 scale among a cohort of 10 people is reported to be 3.26, you might want to ask for the raw data, because that average is only possible with fractional answers.

I recall a study looking at such impossible aggregate statistics leading to several retractions of articles whose data had been made up outright.

Similarly, when someone claims that "four men watched 2,328 hours of hardcore pornography over the course of a year and took the same number of Implicit Association Tests", you might realize that 2328 hours/(4*365) > 1 hour 36 minutes per day; and ask for the titles and duration of the porn allegedly watched, just to make sure that this extremely onerous experiment has actually been performed.

Note that the paper about that "experiment" was not accepted , but at least one reviewer actually recommended less data ("My first piece of feedback on how to make this hybrid article work is that they should remove the quantitative data."), perhaps due to a misunderstanding of sample sizes ("It makes no sense to undertake quantitative analysis for four people – when you flatten the detail out of a sample of four you’re not left with anything interesting.") — the real sample size is at least 2328.

I realize that peer review mostly doesn't operate at that level of scrutiny, but maybe it should. Checking the raw data requires slightly more work of both reviewers and honest authors, but increases the workload of dishonest authors from "make up a few numbers" to "make up as many numbers as if'd actually done the work and don't introduce statistical anomalies", shrinking the gap to "actually do the work".

So even though you need to trust authors a little, it's certainly possible to trust less. There is no perfect protection against academic dishonesty, but there could be better protections.

> Similarly, when someone claims that "four men watched 2,328 hours of hardcore pornography over the course of a year and took the same number of Implicit Association Tests", you might realize that 2328 hours/(4*365) > 1 hour 36 minutes per day; and ask for the titles and duration of the porn allegedly watched, just to make sure that this extremely onerous experiment has actually been performed.

I don’t see the point. The authors could easily respond with a long list of porn titles. (And as an unpaid reviewer with lots of real work to do, are you going to bother verifying every title in the long list?)

>Note that the paper about that "experiment" was not accepted

Then it's not a very good example to base your argument on.

>the real sample size is at least 2328

You’re both wrong. You can’t treat 2328 observations from 4 subjects the same way as 2328 observations from 2328 subjects (see e.g. https://en.wikiversity.org/wiki/Advanced_ANOVA/Repeated_meas...)

More generally, virtually no-one understands statistics. Every field where statistical analysis is used routinely publishes papers that use bad statistical methods.

>I realize that peer review mostly doesn't operate at that level of scrutiny, but maybe it should.

What does the “should” even mean here? Do you think that reviewers who work for free “should” do even more work than they do already? Or that journals “should” force reviewers to do this (even though they have no mechanism for doing so)? There are practical limits to the amount of scrutiny any given paper can be subject to. It would suck if we needed to spend more time reviewing papers just because a bunch of assholes keep trying to get fake papers published.

>it's certainly possible to trust less.

Not really. You don't seem to realize that more scrutiny during the review process would require real people to give up more of their real time for free. You can't just snap your fingers and make that happen.

> are you going to bother verifying every title in the long list?

It should be enough to randomly sample a subset for verification, similar to probabilistic proof checking in cryptography.

>>Note that the paper about that "experiment" was not accepted

> Then it's not a very good example to base your argument on.

You're welcome to suggest a better example ;)

> You’re both wrong. You can’t treat 2328 observations from 4 subjects the same way as 2328 observations from 2328 subjects

You're right of course, but it really depends on how you want to generalize. Observing only 4 subjects makes it hard to estimate population variance and generalize to other subjects, but having 2328 observations of the same subject should give great insights into measurement reliability and changes over time, for those subjects.

> Do you think that reviewers who work for free “should” do even more work than they do already?

I think that reviewers should be compensated adequately for their work, ...

> Or that journals “should” force reviewers to do this (even though they have no mechanism for doing so)?

... by the journals, which can use some of the revenue they make selling subscriptions to enforce a quality standard for the papers they publish.

> It would suck if we needed to spend more time reviewing papers just because a bunch of assholes keep trying to get fake papers published.

Some assholes try and succeed at publishing fake papers, some of them potentially influencing important decisions, e.g. in medicine. You can of course decide that it's not worth the effort to try and stop them, but I feel that publishing fake results should be as hard as possible.

> You can't just snap your fingers and make that happen.

But I can argue on the internet about it. Maybe that doesn't change anything, but it makes me feel better.

>You're welcome to suggest a better example ;)

You mean, I'm welcome to find supporting evidence for your argument? Shouldn't that be your responsibility?

>I feel that publishing fake results should be as hard as possible.

Making it "as hard as possible" would mean using almost all of the world's resources to try to stop fake results being published. If you really want to make the review process more rigorous, you need to present a concrete plan specifying (a) who's going to do the work and (b) who's going to pay for it.

If you can't do that, then what makes you so sure that the review process isn't already as rigorous as it can reasonably be, given the reality of human frailty and limited resources?

> You mean, I'm welcome to find supporting evidence for your argument? Shouldn't that be your responsibility?

Fair enough.

I had some trouble finding the original better example I had mentioned earlier ("I recall a study looking at such impossible aggregate statistics leading to several retractions of articles whose data had been made up outright."), but while trying to find it, I stumbled on another.

Example 1: https://andrewgelman.com/2018/08/21/scandal-isnt-whats-retra...

A study gets published finding a large effect, large enough to cause a replication attempt: "The effort to replicate the original study was successful in everything except the creation of the PSU-level structural stigma variable."

The suspected reason for this replication failure (imputation of missing data) turns out to be wrong when the original authors have someone check their code for data analysis, which is found to contain an error.

If that code had been checked during peer review (or at least afterwards, by including it with the publication), the effort would have been less than a full-blown replication attempt.

Example 2: https://medium.com/@jamesheathers/the-grim-test-a-method-for...

(Which is what I was referring to earlier.)

Simply checking reported means and sample sizes for consistency revealed mathematically impossible results in 50% of tested papers.

He goes to great lengths to stress that such inconsistencies don't necessarily imply fraud (some are honest mistakes), but the behavior of some of the contacted authors when asked for their data appears very sketchy.

Again, if there were a culture of looking at raw data and inspecting analysis code during peer review, those studies reporting obviously incorrect results would not have been published, saving everyone who relies on such studies a lot of trouble.

So, who's going to do that work and who pays for it? I'd be surprised if I managed a working plan on the first attempt, but here's my proposal:

- Authors prepare their data and code together with instructions in such a way that an expert in their field can work with them without having to ask the authors for additional information. It should be attached to the paper as supplementary material. If the data is privacy-sensitive, it should at least be made available to reviewers to check that the results follow from the data. Who pays for it: whoever pays the authors to be writing papers in the first place.

- Reviewers do that sanity-check of running the code on the data to verify that the instructions are complete and the results match what is reported in the paper. They scrutinize the code to the level they'd apply to a methodology section. Who pays for it: the readers of the published paper, since they benefit from not having to do the peer review themselves when they just want to use the results.

Maybe that's unrealistic and the review process is

> already as rigorous as it can reasonably be, given the reality of human frailty and limited resources

but that would be sad.

>Who pays for it: the readers of the published paper, since they benefit from not having to do the peer review themselves when they just want to use the results.

I can't make sense of this. Are you suggesting that journals should pay reviewers and finance this by charging people (more) to read articles?

As for reviewers checking statistical analyses, remember that this is peer review. The reviewers, on average, are going to be just as sloppy and ignorant and careless as the authors. If a field is awash with papers with bad stats, then most reviewers (being drawn from same same pool of people) will not be competent to check a paper's stats. Andrew Gelman isn't available to review every social science paper.

Help me understand how you can coherently want scientific research to be free, rather than locked up in paid journals, and at the same time believe that unpaid peer reviewers should respond to every submission with a kind of counter- research project to rebut the findings of those submissions?

Further: would you rather have peer reviewers be established experts in their fields, or new grad students? If the former, how does reconstituting peer review to trade expectations of good faith for diligent, expensive, tedious review impact who will end up willing to do peer review?

> Help me understand how you can coherently want scientific research to be free, rather than locked up in paid journals, and at the same time believe that unpaid peer reviewers should respond to every submission with a kind of counter- research project to rebut the findings of those submissions?

Because of that incoherence, it's not what I want. I support paying reviewers for the important work they do. If the research is of a kind that should be available to the public for free, then that's a job for research-funding organizations, who could explicitly pay for that cost (rather than implicitly by granting funds to scientists who then end up working as reviewers for free).