# To reject the null hypothesis correctly or incorrectly: that is the question.

Comparing more than one (multiple) variables or conditions leads to an increase in the chance of finding false positives: rejecting the null hypothesis even though the null hypothesis is true (see Table 1). Here I shortly describe four approaches on how to deal with the problem of multiple comparisons.

 Null hypothesis (H0) is true Null hypothesis (H0) is false Reject null hypothesis Type I error False positive (α) Correct outcome True positive (1-β) Fail to reject null hypothesis Correct outcome True negative (1-α) Type II error False negative (β)

Table 1. The four possible outcomes of null hypothesis significance testing. The outcome in bold is the main concern in multiple comparison testing.

1. Family Wise Error Rate (FWER). The FWER (Hochberg & Tamhane, 1987) is the probability of making at least one Type I error among all the tested hypotheses. It can be used to control the amount of false positives by changing α (the chance of a false positive). Normally, α set on 0.05 or 0.01. Using one of the FWER based corrections, from the overall α, an α for each comparison is calculated using the amount of comparisons made. The most known FWER based correction is the Bonferroni correction. Here αcomparison = α/the amount of comparisons.
2. False Discovery Rate (FDR). The FDR (Benjamini & Hochberg, 1995) is the expected proportion of false positives (α) divided by the total amount of positives (the sum of all hypotheses falling in the categories α and 1-α in Table 1). The general procedure is to order all p-values from small to large and compare each p-value to a FDR threshold. Is the p-value is smaller or equal to this threshold it can be interpreted as being significant. How the threshold is calculated depends on the correction methods used.
3. ‘Doing nothing’- approach. Since all correction methods have their flaws, advocates of this approach are of the opinion that no corrections should be made (Perneger, 1998). Scientists should state in their article clearly how and which comparisons they made, and what the outcome of these comparisons was.
4. Bayesian approach. The Bayesian approach discards the frequentist approach, including the null hypothesis statistical testing. By doing this, the whole problem of multiple comparisons, and Type I and II errors do not exist. Rather than correcting for a perceived problem, Bayesian based methods build ‘the multiplicity into the model from the start’ (p.190, Gelman, Hill, & Yajima, 2012).

All four approaches of dealing with multiple comparisons have their advantages and disadvantages. Overall, I believe that the Bayesian approach is by far the most favourable option. Apart from dismissing the problem of multiple comparisons (and others), it provides researchers with the opportunity to collect data in favour of either hypothesis, instead of making probabilistic statements about rejecting or not rejecting the null hypothesis. What stands in the way of applying the Bayesian approach is its theoretical difficulty (as compared to the frequentist approach). With the increase in approachable books, articles, and workshops about the Bayesian approach, and the development Bayesian scripts for statistical software a revolution in how scientists practice science seems to get closer.

References

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society,  57, 289-300.

Gelman, A., Hill, J., & Yajima, M. (2012). Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness,  5, 189-211.

Hochberg, Y., & Tamhane, A.C. (1987). Multiple Comparison Procedures. Wiley, New York.

Perneger, T. V. (1998). What’s wrong with Bonferroni adjustments. British Medical Journal,  316, 1236-1238.