Hacker News new | ask | show | jobs
by _qbxp 3152 days ago
I was doing fMRI work around the time this paper was published. It astonished me that people would simply set an uncorrected voxel-level threshold and call it a day. No FWE-correction, no cluster-threshold - just an 0.001 uncorrected threshold. It was sad that this paper needed to be published to get researchers to start paying attention to that.

I'll be honest - when the paper was published I was thinking "no shit - why do we need a paper to tell us what we all learned in stats 101 about multiple comparisons??" And then realized the quantity of fMRI papers that used uncorrected thresholds.

Very similar feeling when the "Voodoo Correlations" paper came out. Except I was admittedly guilty of having presented correlation coefficients from clusters that had already been identified using thresholding. So that paper really did make me take a closer look at some of my figures/conclusions.

2 comments

Let us be a little bit fair to the researchers who adopted the p=.001 or p=.0001 uncorrected approaches. Their approach wasn't completely unreasoned, and was even justifiable at one time given available methods.

There were mainly two approaches to multiple comparison corrections: Bonferroni and setting an uncorrected threshold. People here might say, well yeah, use Bonferroni.

However, Bonferroni is really only appropriate when comparisons are independent. Voxels (3D pixels) which are adjacent are highly dependent, and indeed the brain is generally correlated. This dependency makes Bonferrnoi correction (very) inappropriately conservative. Given the average dependence of voxels, some researchers estimated that the average number of true comparisons might be on the order of hundreds to a few thousand. In practice, researchers corrected with Bonferroni, either found a really strong effect, or reset using uncorrected threshold. Some reported results using both. People who read the results interpreted results that way too. Bonferroni = reliable, uncorrected = provisional

The contribution of the salmon study and other research papers is that they truly demonstrated that the typical uncorrected thresholds in use were insufficient to control false positives.

You're right. I definitely don't mean to sound like I was an enlightened graduate student. Nothing ever passed FWE using Bonferroni, so we almost always resorted to using uncorrected p-values with cluster thresholding, with the cluster and voxel thresholds set from using alphasim (which gets the probability of having a cluster of that size significant from a random dataset, given the smoothness of your actual images).

If I recall correctly, all the major neuroimaging packages (AFNI, SPM, FSL) had options for cluster-size thresholding at the time. Along with tools like alpha sim to estimate cluster-level FDR (but I think that ultimately had issues with it's algorithm, discovered only a few years later...).

I just remember thinking that if the salmon paper had a reasonable cluster-threshold, none of the spurious voxels would have been considered in the final analysis.

Granted, several years later, a paper came out suggesting that method would inflate false positives (http://www.pnas.org/content/113/28/7900.full).

I imagine the neuroimaging field, particularly the stats part, has changed rapidly since I left.

Sorry, I was writing to HN more than responding to you particularly. It is sometimes easy for non-scientists to underestimate scientists and think of them as fools, when in fact the problems are frequently hard.

I believe that you are correct that about the time of the salmon poster there were other methods available for multiple comparison correction. The work in the early- to mid-2000's was much more "wild-west" however.

Indeed cluster correction may have its own issues, re your link. I think that a good approach these days is to eschew whole-brain approaches for theory-drive, a prior i ROIs, then supplement those analyses with a whole brain exploratory analysis.

I was unaware of the salmon paper, but I remember a bit later being very puzzled by a study about comatose/vegetative patients that included brain dead subjects as controls (and off course healthy subjects as well).

I suppose it was meant to convince statistically naive readers that the dead salmon thing didn't apply to their methodology.