Hacker News new | ask | show | jobs
by carbocation 1499 days ago
I don't think the problem is that polygenic scores are noisy. (You can choose to make them less noisy by restricting to significant SNPs, for example.) And noise doesn't require directional bias. But to me there are 2 problems:

(A) Polygenic scores for behavioral traits may be estimated in GWAS where the null assumptions (e.g., that mating is not conditioned on the trait being estimated) may not be valid[1]. That is on top of the issues that we usually face for other phenotypes (e.g., more routine population stratification due to geographic history).

(B) The authors did not describe the (genetic) ancestral background of the children being studied. Current techniques are biased across ancestries, for most traits, when using polygenic scores[2]. Certainly adjusting for 20PCs in the final model, as the authors seemingly did, would not be expected to make the scores comparable unless all of the children are from a close ancestral group.

With these sources of stratification, the polygenic score represents more (and less) than the trait that you're hoping it estimates; it also encodes population stratification.

As such, I hardly think this study can be interpreted.

1 = https://www.nature.com/articles/s41467-022-28294-9

2 = https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6563838/

1 comments

> You can choose to make them less noisy by restricting to significant SNPs, for example.

That makes them more noisy, not less. PGS predictive power for EDU/IQ is always maximized at use of all SNPs. Restricting to the arbitrary subset of genome-wide statistically-significant SNPs in Lee would drive it from the 7% or so they have to <1%, IIRC.

Also, neither of your two problems are the problem here, as the biases there would not be expected to drive a correlation between video game playing & IQ (what sort of within-ethnic interaction would you need for that and why is it plausible?), and would mostly serve to simply not control for intelligence (and quantitatively, because the PGS here is a small fraction of the variance, even gross biases which somehow did manage to drive correlations between those two variables, would still be unable to meaningfully affect the estimates).

> That makes them more noisy, not less.

Using only genome-wide significant SNPs reduces the amount of variance explained by the polygenic score, which is what you describe and I agree with. My comment about the concern about "noise" is with respect to a sibling comment ("Polygenic scores are powerful, but they contain very large amounts of noise compared to the true genetic effects.") That is the "noise" that I was addressing. And just as you say, the noise is, essentially, a worthwhile cost to pay since it should not be directional, and so we use various approaches to include thousands or millions of SNPs in these scores.

> Also, neither of your two problems are the problem here

I don't agree. These problems occur very clearly in any mixed-ancestry analyses, and they have to be carefully accounted for or else they induce between-ancestry bias. It's not a function of the phenotype itself (i.e., I'm not making a comment about intelligence); this is true for all polygenic scores.