| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tomkinstinch 3037 days ago

A group upstairs at the Broad Institute built out a system to use Jupyter notebooks for analysis of genomic data, with backend computation happening on a Spark cluster[1]. Science on large datasets can happen via interactive notebook. In a connection with a recent GWAS on a massive dataset from the UK Biobank, the researchers involved decided not to write a traditional scientific journal article (at least for now) since their analyses will continue to mature. Instead, they've been posting insights online in blog form, with associated code on GitHub[2]. It's a daring move toward publishing at the speed of research. Once their conclusions mature, traditional journal articles may follow to distill and preserve the key findings. In the mean time, those in the field can apply the same code to their data, replicate the analyses, and get an early look at the output of the research. This works partially because the methods (univariate GWAS) are understood in the field and the interpretation and rendering of a particular dataset is the science in this case, rather than a new method (which would still likely warrant a paper).

1. https://hail.is

2. http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousand...

1 comments

JorgeGT 3037 days ago

> Science on large datasets can happen via interactive notebook.

I did not claim the opposite, just that it regularly happens without interactive notebooks. This seems like an interesting project though. Regarding the blog posts, it seems that there's a bug that makes all the entries appear as published on September 20, 2017?

link