Hacker News new | ask | show | jobs
by Thriptic 4337 days ago
I couldn't agree more with almost everything you just said (especially number 2). My only potential issue is with number 5. While I agree that raw data should be shared for purposes of reproducibility and progress, I can also partially sympathize with investigators who put in enormous time and effort to coordinate and run large studies / clinical trials.

If investigators were forced to immediately release their raw data from these studies, there would be armies of other investigators swooping in to scoop the original team on follow on studies from the data. While this would certainly be great for science, it partially punishes investigators for actually conducting the large trials. I'm not sure how justifiable it would be to put in the effort to conduct a large clinical trial and then only get 1-2 papers out of it (even if they went into NEJM / JAMA / Lancet etc).

What are your thoughts?

2 comments

The purpose of 5 is to improve science not ‘reward scientists’ [1]. If we moved to a system where the raw data was shared automatically then the number of “exclusives” any group could get from a study would decline, but the value of each paper would go up. As long as everyone was sharing then I don’t think funding bodies would stop funding groups willing to go to the effort of doing large studies. It is the funding that determines what research is done, not the how many papers a group can milk out of the study. It should be quality over quantity.

[1] For those outside of science what happen now is groups with the data hold back the data and then use access to the data to establish “collaborations” - basically they will give you access to the data as long as you put their names on any resulting papers. The people with the data often don’t actually contribute anything to the new publication other than access to the data and their names - my old boss was a expert at doing this.

The encode project maybe had the right idea with this. The data was made public, but researchers who did not contribute to the data production could not use it for a period of 6 months or a year for their own publications.
Well, as we're already dreaming, why not stipulate that the person who gets credit for the breakthrough should be the person who gathered the data, not the person who analyzed it.