Hacker News new | ask | show | jobs
by patall 3641 days ago
As I see it pretty much from inside, this is not a general problem but just the evolution of science, as we are able to determine more and more things which lead to better and better conclusions. The problem in what is cause and effect comes from the sheer amount of data you create (for a current study we have whole genome methlaytion data for 19 cell populations in mice, all in triplicates and at medium coverage), which is obviously prone to many false positives. And as we are now approaching single cell level (which will dramatically improve results), this number is only going up. And of course it is hard to check all these positives extensively. But yeah, this is sciece and we are only getting better, so no pain but just an opportunity. A problem is rather the amount of data we create (we are speaking about Petabytes) that have to be stored and made accessible for decades so we can later recheck our conclusions. Nobody wants to pay for that
2 comments

I don't really think, except in the case that you are legally required to store the raw data for decades, that there is any reason to store petabytes of data to recheck your conclusion. I'm not aware of any retention policy generally required of scientists to do that- nor do I think the value of being able to recheck a conclusion by investigating this kind of raw data is really required.

If it was required, you'd be willing to pay the (large, but economically justified) cost of archiving the data.

Im interested in chatting on versioning in this data ( and possibly by extension, compression ) . Happen to have a method to contact you ?
Not in this case as the data I described is on embargo. The problem is more general for all genomic data that has already been made public but were funding runs out after 5 years or so. I happend to chat to a postdoc the other day who is involved in the ICGC cancer consortium and had already some thoughts about the problem. If you want I can ask him whether he wants to contact you.
Would love to chat with him!