|
A lot of the low-hanging fruit, in terms of statistical significance for heritable disease, has already been picked, in the sense of a naive search for SNP X in gene Y that causes disease Z. The more people continue to look for single SNPs or even single gene causal factors, the more we learn that they are of limited use, due to the complexity and redundancy of biological interaction networks (genes/proteins/epigenetic modifications/environment/etc) One thing that people are definitely still interested in regarding single genes or SNPS is looking for biomarkers. For instance, it's pretty easy to look at RNA abundance with RNA-seq and make a model with good discriminatory power to predict who does or doesn't have (say) breast cancer. But Those tests can be expensive, and profiling the appropriate tissue can be difficult. On the other hand, if we could find SNPs that correlate well with the RNA expression model that predicts disease, then we could just do a cheaper test/faster/easier test for the SNPs. Even better if we can validate causation independently using the emerging, though incomplete corpus of biological pathways. Beyond that, some folks are excited about the field of functional genomics, which aims to correlate genome-level data to the structure and function of gene products (eg proteins) to get a more low-level, causal look at things. Biological network extraction is something I'm excited about. Here, you usually want to extract pairs of features, or better yet, higher order structures from data to learn about what factors are important in disease pathways. This puts you immediately in the regime of way more potential features than data points, even if you sequence everyone in the world. I think this consideration worth thinking about when people try to hype you that big data and machine learning will just "solve" biology or medicine. L1 regularization only gets you so far. But it gets you somewhere, and can be hugely useful in suggesting new experiments. I also think it's important to think about how it may soon be cheaper to store actual RNA/DNA and sequence it on demand, than it will be to store the data itself. DNA sequences would be a fine thing to sequence once and store, but for stuff, maybe not so much. |