Hacker News new | ask | show | jobs
by stochastic_monk 2957 days ago
It's worth considering that anywhere in graphical models where coefficients of any sort learned can be augmented by neural networks (such as in the last decade of natural language processing, where the SOTA of almost all problems has been successfully neuralized).

I wonder if Deep Belief Machines and their flavor of generative models, which seem closer in nature to Pearl's PGMs, have a chance to bridge the gap involved.

Edit, as an aside: Given the enormously high dimensionality of personal genomes and the incredibly small sample size, for over a decade I've failed to put any trust in GWAS studies and found my suspicion supported on a number of occasions, considering difficulty in reproducibility likely brought about by the above problem. Is there any reason to think that improved statistical methods can possibly surmount the fundamental problem of limited sample size and high dimensionality?

1 comments

Numerous important biomedical findings have resulted from GWAS. Most GWAS today are inherently reproducible since their hits usually come from multi-stage designs with independent samples. Sample sizes are no longer "incredibly small" either; large GWAS often have in the order of 100s of 1000s of patients. Some have over a million.

I suppose the most important idea is that GWAS aren't really supposed to show causality. "Association" is in the name. GWAS are usually hypothesis generating (e.g., identification of associated variants) and then identified variants can be probed experimentally with all of the tools of molecular biology.

In summary, GWAS have their problems, but I think your statement is a bit too strong.

Mendelian randomization is a good technique to start thinking about causality for epidemiological studies.

This is a good paper that demonstrates the approach: https://www.nature.com/articles/srep16645 Millard, Louise AC, et al. "MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization." Scientific reports 5 (2015): 16645.

Thousands of samples and millions of dimensions still doesn’t strike me as an easy problem, but it makes sense to me that downstream molecular biology can verify putative associations. Thank you for weighing in.