| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by janeway 1346 days ago
	Oh no. A major flaw that kills protects; to run a valid statistical test you need to understand the underlying reality of the data. Otherwise you just run tests until you find “something”. How do you handle one genomic variant affecting dozens of different rna transcripts and isoforms? How do you handle tissue-specific expression? LD haplotype blocks? Frequency across populations and reference choice? Sample handling affecting read depth? Mixed direction of effects in phenotype-genotype? The critical (and beauty IMO) feature of bioinfo is requiring an understanding of how your dataset can rarely be considered clean and as simple as _observation name_ and _observation value_. To succeed it is usually critical to know a lot about the observation meta data which is not collected in the dataset. Hopefully in the future it will be better curated and less esoteric.