Hacker News new | ask | show | jobs
by danieltillett 4338 days ago
There are so many issues raised here that it hard to know where to respond.

1. Once area we could stop is useless data-mined correlation studies that show statistical significance (assuming you ignore that data-mining has occurred) between action X and outcome Y - the sort where a retrospective study of 500,000 nurses finds that eating candied peanuts reduces prostate cancer by 15%. The rule of thumb in any of these studies is that unless the effect is 300% or greater (smoking and lung cancer is 1500%) then the result is certain to be garbage.

2. We need less “novel” research and more replication of past results. The whole scientific system is set up to reward novelty over accuracy. It is so bad that unless I have seen two independent groups repeat something I doubt it is real no matter how famous the group.

3. We need to reward being right over being first. Right now groups rush papers out so they don’t get scooped and so don’t check their results as well as they should. I would personally like to remove the date off all scientific papers to stop these silly games - after all if something is true does it become less true just because it was published last year rather than last week.

4. We need to reward people who put the effort into replicating work. A simple proposal would be to give publication right to every group that replicated (or could not replicate) a study in the same journal. If some study is published in Nature and you go to the effort of replicating it then you should get an automatic Nature publication.

5. Stop scientist from holding on to raw data. In theory scientist are supposed to share their data, but in practice this doesn’t happen very often. It should be possible to report groups that don’t share data to the funding bodies and if they are found to not be not sharing (or only sharing some of the data) then the group is banned from getting any new funding. It would only take a few banning to stop this immoral data hoarding.

1 comments

I couldn't agree more with almost everything you just said (especially number 2). My only potential issue is with number 5. While I agree that raw data should be shared for purposes of reproducibility and progress, I can also partially sympathize with investigators who put in enormous time and effort to coordinate and run large studies / clinical trials.

If investigators were forced to immediately release their raw data from these studies, there would be armies of other investigators swooping in to scoop the original team on follow on studies from the data. While this would certainly be great for science, it partially punishes investigators for actually conducting the large trials. I'm not sure how justifiable it would be to put in the effort to conduct a large clinical trial and then only get 1-2 papers out of it (even if they went into NEJM / JAMA / Lancet etc).

What are your thoughts?

The purpose of 5 is to improve science not ‘reward scientists’ [1]. If we moved to a system where the raw data was shared automatically then the number of “exclusives” any group could get from a study would decline, but the value of each paper would go up. As long as everyone was sharing then I don’t think funding bodies would stop funding groups willing to go to the effort of doing large studies. It is the funding that determines what research is done, not the how many papers a group can milk out of the study. It should be quality over quantity.

[1] For those outside of science what happen now is groups with the data hold back the data and then use access to the data to establish “collaborations” - basically they will give you access to the data as long as you put their names on any resulting papers. The people with the data often don’t actually contribute anything to the new publication other than access to the data and their names - my old boss was a expert at doing this.

The encode project maybe had the right idea with this. The data was made public, but researchers who did not contribute to the data production could not use it for a period of 6 months or a year for their own publications.
Well, as we're already dreaming, why not stipulate that the person who gets credit for the breakthrough should be the person who gathered the data, not the person who analyzed it.