Correction for the multiple testing problem

“One mature Atlantic Salmon (Salmo salar) participated in the fMRI study. The salmon measured approximately 18 inches long, weighed 3.8 lbs, and was not alive at the time of scanning.” Although the salmon was dead several brain areas appeared to be processing what emotion a person on a picture displayed (Bennett, Baird, Miller & Wolford, 2011).
This false result was obtained by testing so many voxels that false positives emerged. With each added test the result of a type 1 error increases (Bender & Lange, 2001). A simulation study showed that when you simulate an image of two active areas 1000 times that every voxel in the voxel space is deemed as active at least once (Logan & Rowe, 2003).
There are methods for correcting the amount of false positives in multiple testing but this is not always done in fMRI research. Between 24% and 40% of the articles published in 2008 did not correct for multiple testing (Bennett, Baird, Miller & Wolford, 2011 – supplementary material).
Two often used procedures for correcting for the multiple testing problem are the family wise error correcting procedure (FWE) and the false discovery rate correcting procedure (FDR). All procedures have to find a balance between correcting for false positives (type 1 error) and false negatives (type 2 error).  Any method that protects more against one type of error is guaranteed to increase the rate of the other kind of error (Lieberman & Cunningham, 2009).
The family wise error rate (FWER) is the probability of making one or more type 1 errors in a family of comparisons. For a family wise error of 5% there is a 95% confidence level that there are no type 1 error in the data. The simplest FWE correction procedure is the original Bonferroni correction. This method divides the alpha level, the chance of a type 1 error (normally 5%), by the amount of voxels (Dunn, 1961). So for example when 100.000 voxels are tested at an FWE rate of 0.05 the threshold for a voxel would be 0.05/100000=0.0000005. Since the introduction of the Bonferroni procedure the procedure has been improved (Nichols & Kayasaka, 2003).
Another FWE correcting procedure is based on the random field theory. The reasoning behind the random field theory is that since the p-values of the voxels are (locally) dependent we have to use that dependency to correct for multiple testing. The random field theory does not test individual voxels but individual observations, ‘active brain clusters’. This can reduce the amount of tests by a several factors (Brett, Penny & Kiebel, 2003).
Another correcting procedure is the FDR correcting procedure. Instead of correcting in the whole family, all tested voxels, the method only corrects in active voxels. The false discovery rate (FDR) is the expected ratio of the number of erroneously rejected null hypotheses to the total number of rejected null hypotheses. The method guarantees that in all active voxels the maximum amount of false positives is at a specified level (i.a. 5%). So the FDR method is flexible, meaning it can chance with the numbers of tests.
A comparison between the FWE and FDR correcting procedures showed that the FDR maintained higher power in the active brain regions, meaning less type 2 error, but at the cost of more falsely detected voxels (Logan & Rowe, 2003). Verhoeven, Simonsen and McIntyre (2005) found that FWE is preferred only when the penalty of making a type 1 error is severe. FDR control is more powerful and often is more relevant than controlling the FWER.
The two procedures are not the only procedures for correcting for multiple testing, a promising new procedure is combining spatial information with Bayesian testing methods (Bowman, Caffo, Bassett & Kilts, 2008).

Bender, R. and Lange, S. 2001. Adjusting for multiple testing: when and how?                         Journal of Clinical Epidemiology, 54, 343-349.
Bennett, C. M, Baird, A. A, Miller, M. B., & Wolford, G. L., 2011. Neural                                correlates of interspecies perspective taking in the post mortem Atlantic Salmon:         An argument for proper multiple comparisons correction. Journal of serendipitous         and unexpected results, 1(1), 1–5.
Bowman, D., Caffo, B., Bassett, S. S. & Kilts, C., 2008. A Bayesian hierarchical                 framework for spatial modeling of fMRI data, NeuroImage, 39, 146–56.
Brett, M., Penny, W., Kiebel, S., 2003. An Introduction to random field theory, In:                 Frackowiak, R.S.J.,  Friston, K.J., Frith, C., Dolan, R., K.J., Price,  C.J., Zeki, S., Ashburner, J., Penny, W.D. (Eds.),   Human Brain Function, 2nd edition. Academic                 Press.
Dunn, O.J., 1961. Multiple Comparisons Among Means. Journal of the American                 statistical association56, 52-64.
Lieberman, M.D., & Cunningham, W.A., 2009. Type I and Type II  error concerns in         fMRI research: Rebalancing the scale. Social  Cognitive and Affective                         Neuroscience, 4, 423–428.
Logan, B. R. & Rowe, D. B., 2004. An evaluation of thresholding techniques in fMRI         analysis. NeuroImage 22, 95–108.
Nichols, T., Hayasaka, S., 2003. Controlling the familywise error rate in functional                 neuroimaging: a comparative review. Statistical Methods in Medical research,                 12(5), 419 – 446.
Verhoeven, K. J. F., Simonsen, K. & McIntyre, L. M, 2005. Implementing false discovery         rate control:  increasing your power. Oikos, 108, 643-647.

 

One thought on “Correction for the multiple testing problem

  1. As another alternative, the k-FWER of Lehmann & Roman (2005, http://arxiv.org/pdf/math/0507420.pdf) allows you to increase power (and increase type I error) by increasing k. FDR is more powerful than k-FWER for k=1, but not general k, and k-FWER makes a clearer claim about type I error: you get to pick k, rather than have a more probabilistic guarantee (about the expected value of the false discovery proportion). Admittedly not my area of expertise, though. Fun story about the emotional dead salmon–multiple testing certainly seems important with so many voxels to test in fMRI data.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>