In the first assignment of this course we analyzed a dataset. This dataset was only half of the original dataset and in the third assignment we redid the analysis on the other half of this dataset. Some people found the same results in both halves, but most of us weren’t able to find all the effects found earlier. Some even found effects in the opposite direction. I for example found a difference between men and women on two tests in the first half of the dataset. I thought these effects were strong because the p-values of both effects were .003, in the second half of the sample these effects had p-values of.169 and .143. Because the sample was randomly split I was very surprised to see what I though were strong effects turn into no effects. Somebody else found ten significant effects in the first sample and was able to find only two of them in the second sample. This assignment shows how careful we should be when drawing conclusions.
I think the reason why I and probably most us put too much trust in p-values is because we underestimate variance. That people underestimate variance is shown over and over again when they are asked to make up a sequence of coin flips that looks random. Most people make a sequence with only five or six times heads or tails in a row. In reality it is not unlikely to find heads or tails ten or even more times in a row. I think the way we underestimate the variance of the coin flip, is the same way we underestimate the variance of the effects we study. When we find a low p-value we think the effect is there and we underestimate the possibility that the effect cannot be found in a similar sample.
An in my opinion perfect example of a significant effect that doesn’t exist is the study by Roskes et al. (2011). They found that soccer goalkeepers are more likely to dive to the right than the left, when they are behind in a penalty shootout. To see if this is a true effect we are going to replicate their study. We will use the exact same methods as Roskes et al. and analyze the data in the same way they did. We do this to make sure our studies can be compared. But because they analyzed the data in the wrong way (see blog post below) we will also do the proper analysis. To do this we are going to document beforehand exactly which and how many data we will collect, how we score these data and what analyses we will use to establish if there is an effect. Hopefully Roskes et al. can agree with us in advance that we do the replication just like the original study. In that way we can decide whether Roskes et al. got fooled by variance or found an existing effect.
Roskes, M., Sligte, D., Shalvi, S., & De Dreu, C. K. W. (2011). The right side? Under time pressure approach motivation leads to right-oriented bias. Psychological Science, 22, 1403-1407.