|
|
|
|
|
by Mumps
112 days ago
|
|
Yes, if OP did a full vocabulary comparison and took just those sub-threshold, it would be hacking. I'm not sure that's the case here, though? Given that (the post) OP started with em-dash, and probably didn't do repeated sampling, then it should be a pretty fair hypothesis that em-dash usage is a marker. Your comment about p<0.05, feels out of place to me. The p-values here are << 0.05. Like waaaaay lower. Perhaps Fisher's exact is more appropriate, on the per-word basis? |
|
> One of the simplest approaches to correct for multiple testing is the Bonferroni correction. The Bonferroni correction adjusts the alpha value from α = 0.05 to α = (0.05/k) where k is the number of statistical tests conducted. For a typical GWAS using 500,000 SNPs, statistical significance of a SNP association would be set at 1e-7. This correction is the most conservative, as it assumes that each association test of the 500,000 is independent of all other tests – an assumption that is generally untrue due to linkage disequilibrium among GWAS markers.
https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...
cf: https://en.wikipedia.org/wiki/Bonferroni_correction