Hacker News new | ask | show | jobs
by d4t4 2349 days ago
While I agree that collecting more data opens up the possibility for more complex, exciting analysis, gene expression data is no magic bullet. It is often noisy and still prone to experimental error because biological systems are so heterogeneous. Also, for a lot of investigations, there are only a handful of samples which means that the number of distinct data points for a specific transcript are very low. You typically end up with many columns but few rows in your dataset which makes modeling biological systems quite challenging.
1 comments

As far as I see the current solution to the ,,few rows'' problem and the read errors is to have even more data and to make sure that each read is treated as what it is: a sample from a different cell.

This way for example a bone marrow sample can be used to model the whole dynamic that's going on in the human system, and the evolution of cells / RNA expressions between cells. Of course this means that more probabilistic hidden models have to be used, instead of putting together the reads to form a genome.

Have you seen this video? I think it's amazing:

https://youtu.be/G_Rhp9LWDUM

The way cancer is studied is also changed by this method, though you 'll need the new Illumina machine with 30x number of reads that was announced a few days ago.