No sign of right-oriented bias in goalkeepers: A ‘failed’ replication of Roskes et al. (2011): The right side? Under time pressure, approach motivation leads to right-oriented bias

Roskes, Sligte, Shalvi and De Dreu (2011) found that goalkeepers dive more to the right than to the left when they are behind in a penalty shootout than when they are tied or ahead. Roskes et al. (2011) argue this is the case because goalkeepers who are behind are more approach motivated than goalkeepers who are tied or ahead. The rightward bias occurs when people are approach motivated and under time pressure; like goalkeepers who are behind in a penalty shootout. Unfortunately Roskes et al. (2011) do not mention why only goalkeepers who are behind are approach motivated. There is also a methodological concern; the original data do not support the claim they make (see my earlier blog post). And this finding can have massive impact, because goalkeepers could train to overcome this bias and stop more penalties. For these reasons it is important to replicate this finding and see if the rightward bias is a real thing.

To do this in a confirmative way, we registered exactly what we were going to measure and which analysis we would perform (see an earlier blog post).  Unfortunately we could not stick with our initial plan entirely, because the quality of some videos was extremely bad. The most important thing about the registration is that we should stick with our original analysis and that is still the case.

The analysis showed that; goalkeepers dived equally to the right and the left, when their team is ahead, (1, N=124) =2.613, p = 0.106. Goalkeepers dived equally to the right and the left, when their team is tied, (1, N=163) =3.245, p = 0.072. Goalkeepers dived equally to the left and the right, when their team is behind, (1, N=41) =.610, p = 0.435. This results show there is no rightward bias in goalkeepers.

Our replication study showed there is no rightward bias in the diving direction of goalkeepers when their team is behind. Our study also showed how important it is to pre-register the analysis you intend to do. When we would analyze all the penalties together instead of separately for behind, tied and ahead, we would find goalkeepers do dive more to the right, (1, N=329) =6.71, p = 0.01. We might end up concluding that we extended the original findings of Roskes, Sligte, Shalvi and De Dreu to apply for all goalkeepers. Instead of only the ones that are behind. Because conclusion dependent so heavily on the analysis you do, more researchers should pre-register their analysis. In that way it is sure the conclusions are confirmatory, nowadays it is often unclear whether a finding is confirmatory or exploratory. This study showed in a confirmatory way that the rightward bias does not exist in goalkeepers.

Roskes, M., Sligte, D., Shalvi, S., & De Dreu, C, K, W. (2011). The Right Side? Under Time    Pressure, Approach Motivation Leads to Right-Oriented Bias Psychological Science, 22, 1403–1407.

Are the BH and BL procedures solutions to the multiple comparisons problem?

“The gene for depression is found” and “The brain region for fear is located” are typical headlines that appear in newspapers and on news sites every now and then. In these studies lots of hypotheses are tested; many genes are tested to correlate with some sort of behaviour and many voxels (the smallest brain unit fMRI can measure) are tested to correlate with emotions. As the number of tested hypotheses in a study increases the chance of finding an effect increases. Because scientists only want 5% of the effects to be false positive they should control for multiple comparisons.

The traditional way to control for multiple comparisons is the Bonferroni correction, but using the Bonferroni correction decreases the power. When the power is low, it is hard to find existing effects. Because of the low power, researchers do not control for multiple comparisons. Benjamini et al. (2001) proposed the BH (for independent data) and BL (for dependent data) procedures to control for the False Discovery Rate (FDR), but maintain the power. The FDR controls the number false positive findings per study; for example, 2 false positive results out of 50 is ok, but 20 false positive results out of 50 is way too much. The BH and BL procedures are statistical techniques that control the FDR.

To see how much power is maintained using the BH and BL procedures I conducted a simulation study of which the results can be found in the figure below.

 

I simulated data in which 20%, 50%, 80% or 100% of the hypotheses were true and 10, 50, 100 or 500 hypotheses were tested. The power of Alpha and Bonferroni are the same across the four panels of the figure, so it is easy to compare them with the power of the BH and BL procedure. As is shown in the figure, both BH and BL procedure gain power as the number of true effects increases. But the power of the BL procedure decreases as the number of tested hypotheses increases. This is unfortunate, because in genetic and fMRI research many hypotheses are tested and the data is dependent. The BL procedure could be a solution for these fields, but when the number of tested hypotheses is large the power is only slightly better than the power of the Bonferroni procedure. Although the BH and BL procedure maintain the power overall quite well, in the condition that could help the genetic and fMRI researchers the power doesn’t really differ from the Bonferroni procedure. The BH and BL procedure are a step in right direction, but are not (yet) the solution to the multiple comparisons problem.

Fooled by Variance?!

In the first assignment of this course we analyzed a dataset. This dataset was only half of the original dataset and in the third assignment we redid the analysis on the other half of this dataset. Some people found the same results in both halves, but most of us weren’t able to find all the effects found earlier. Some even found effects in the opposite direction. I for example found a difference between men and women on two tests in the first half of the dataset. I thought these effects were strong because the p-values of both effects were .003, in the second half of the sample these effects had p-values of.169 and .143. Because the sample was randomly split I was very surprised to see what I though were strong effects turn into no effects. Somebody else found ten significant effects in the first sample and was able to find only two of them in the second sample. This assignment shows how careful we should be when drawing conclusions.

I think the reason why I and probably most us put too much trust in p-values is because we underestimate variance. That people underestimate variance is shown over and over again when they are asked to make up a sequence of coin flips that looks random. Most people make a sequence with only five or six times heads or tails in a row. In reality it is not unlikely to find heads or tails ten or even more times in a row. I think the way we underestimate the variance of the coin flip, is the same way we underestimate the variance of the effects we study. When we find a low p-value we think the effect is there and we underestimate the possibility that the effect cannot be found in a similar sample.

An in my opinion perfect example of a significant effect that doesn’t exist is the study by Roskes et al. (2011). They found that soccer goalkeepers are more likely to dive to the right than the left, when they are behind in a penalty shootout. To see if this is a true effect we are going to replicate their study. We will use the exact same methods as Roskes et al. and analyze the data in the same way they did. We do this to make sure our studies can be compared. But because they analyzed the data in the wrong way (see blog post below) we will also do the proper analysis. To do this we are going to document beforehand exactly which and how many data we will collect, how we score these data and what analyses we will use to establish if there is an effect. Hopefully Roskes et al. can agree with us in advance that we do the replication just like the original study. In that way we can decide whether Roskes et al. got fooled by variance or found an existing effect.

Roskes, M., Sligte, D., Shalvi, S., & De Dreu, C. K. W. (2011). The right side? Under time pressure approach motivation leads to right-oriented bias. Psychological Science, 22, 1403-1407.

What went wrong in the original analysis of Roskes et al. (2011)?

Roskes et al. (2011) measured whether goalkeepers dove to the right, middle or left when they are behind, tied or ahead. To analyze this data you would expect a 3×3 table which can be analyzed with the Chi square test. With their data that table looks like this:

Left Middle Right
Behind 7 0 17
Tied 47 3 48
Ahead 42 2 38

To analyze this data with the Chi square test the assumptions must be checked. In this case enough observations in each cell and independence of the data. There is a problem that there are very few observations in the middle category and there is a problem that the data is not independent because a lot of same goalkeepers defend their goal on different penalties. The assumptions are not met, so the Chi square test cannot be used.

Roskes et al. used the Chi square anyway, but in the wrong way. When doing a Chi square test you first do the analysis on all the data. If there is an effect, you can explore the data further and see what these effects are. Testing this data gives, X² (4) = 4.98, p = .289, so there is no effect. We can conclude the diving direction of goalkeepers is independent of whether they are behind, tied or ahead. This is not what Roskes et al. concluded, they went on analyzing the data. Perhaps they dropped the middle category and took the tied and ahead category together since there were interested if goalkeepers dive more to the right when behind, but not when tied or ahead. The data then looks like this:

Left Right
Behind 7 17
Tied or Ahead 89 86

Testing this data gives, X² (1) = 3.16, p = .076, so there is no effect. This is not what Roskes et al. conclude, they decide to drop the control condition (Tied or Ahead) and just test if goalkeepers dive more to the right than to the left when behind. They find a significant effect, X² (1) = 4.17, p = .041, and draw the wrong conclusion that goalkeepers dive more to the right than to the left when behind, but not when their team was tied or ahead. This conclusion is wrong because they dropped the control condition, they can only conclude that goalkeepers dive more to the right than to the left when behind. Of course given the violated assumptions even that conclusion is problematic.

Roskes, M., Sligte, D., Shalvi, S., & De Dreu, C. K. W. (2011). The right side? Under time pressure approach motivation leads to right-oriented bias. Psychological Science, 22, 1403-1407.