Hacker News new | ask | show | jobs
by simonster 4346 days ago
The author is right to criticize the current state of data sharing and analysis. However, the solution is not to have experimentalists collect their data and then pass it on to someone else. That would be good for the people doing data analysis (provided they know enough to understand the experimental procedures, which is not always the case), but remove many of the incentives to be an experimentalist. Science is much less fun if you don't actually get to make new discoveries.

The problem is that in many fields there is a weird dichotomy between people who know how to get data and people who know what to do with it. This is not a sustainable situation. Proper experimental design requires knowledge of how the data will be analyzed.

My proposed solution is to require that the leaders of research groups have expert knowledge of both experimental procedures and data analysis, because that is the expertise required to pick an appropriate hypothesis and supervise the corresponding scientific project from start to finish. Because students 1) work in a lab with diverse knowledge and 2) desire to become professors themselves, they are likely to acquire these skills as well. Aspiring professors who have substantially greater aptitude for either data collection or data analysis should form a joint lab with a researcher with the complementary skill set so that their students can learn both fields.

2 comments

I had a conversation along these very lines today. The hard question is how to fund such a thing. A lot of funding seems to be for the individual researcher to start a group, or for large consortia to work on a project, with precious little middle ground. My ideal lab would in fact be medical biologist, mass spectrometrist and bioinformatician working together. I'm not sure how to do that when each individual has a good, but not insane track record. This is to say nothing of the difficulties in putting such a team together.
There's a lot of data that can be collected without much knowledge of experiment design, consider e.g. all the sequencing projects.

The problem is that if the data ends up in inconsistently formatted spreadsheets or poorly conceived custom formats, the effort required to extract the data for analysis later (especially across multiple projects) can be prohibitive.