Hacker News new | ask | show | jobs
by latj 4888 days ago
Some labs are conservative because they are worried that they will not be able to reproduce the same analysis. For example, imagine that a lab had been collecting samples and executing the current code as they came in. Now imagine, two years later, someone starts drawing some conclussions based on an aggregation of the results over a petabyte of that data. On one hand, you could just say- nope, we cant reproduce the same analysis, but we can use all of our computational power for a month and reanalyze all of the data using the current packages/code. On the other hand, a more conservative idea might be to try to record the entire state of the environment when particular samples are recorded, so that in theory you could replay all that analysis- fire up the vm from 2 years ago, install the same version of all the packages, install the code with the same tags, and analysis that data set, then do the same thing for every other data set. Smaller labs I think are just hoping that no one tries to replicate their studies or asks them if they can reproduce their results.