Hacker News new | ask | show | jobs
by ernst_10 1685 days ago
> I think the biggest risk of being oversold right now are the "polygenic risk scores" being used for various quantitative traits, which fail to generalize very well.

How do the hits from large scale GWAS fail to generalize within a population? Why would you get worse models with more advanced techniques, presumably the more advanced techniques would only be used instead of the simple ones if they are proven to work.

1 comments

> How do the hits from large scale GWAS fail to generalize within a population?

Great question! I haven't revisited the question in quite some time, but one possible explanation is sometimes called the "winner's curse." There's a "measurement error" for each site in that genetics is never deterministic, and when you have millions of sites, and only thousands of cases split into yes/no categories. If a particular site explains only 10% of the "yes" cases, it could be that random sampling makes it look like 15% or 7% or whatever. And when you have a handful of sites, each with measurement error of this sort, when you sort the handful of sites, the genomic sites with the biggest effect are likely to come from random chance.

More advanced models have a greater ability to find spurious signals, to overfit the data, and to mispredict. Training with more parameters, correctly, often requires more data, and the biggest limitation to GWAS is the amount of data. For some rare conditions, there may not be bough humans alive to provide the data fully needed, if we are going only by GWAS. Because GWAS doesn't have any model for how cells work, and the naturally occurring variation in the human population is unlikely to provide enough data to reconstruct the relationships that are necessary to rebuild the parts of that physical model in order to make predictions from DNA alone.

There are also potentially different prevalence of causal variants in different populations, depending on the interactions between different genes. In the San Francisco East Bay, there's a mixing of many many previously geographically different human populations. Rare diseases that are most prevalent in one population are starting to present differently when there's admixture with more distantly related humans. The whole combination of minor differences can cause changes to what we think was a well-defined disease.