The problematic p-value: How large sample sizes lead to the significance of noise

The most reliable sample with the highest power is of course the complete population itself. How is it possible that this sample can reject null hypotheses that we do not want to reject?

When the sample size increases this has as a consequence that very small effect sizes can become significant. I did an independent samples t-test over two simulated groups on a variable over which they are normally distributed. This was repeated 1000 times with a sample size of 200 persons per group and 1000 times with a sample size of 20000 persons per group (see figure 1.). I analyzed what proportion of these t-test are significant. Note that even if there is no effect or the effect is too small to detect, you expect to find at least 5 percent of the t-test to be significant due to a significance level of .05 (type I error). When the total sample size consists of 400 persons (200 per group) you will find in approximately 5.6% of the times a significant effect (running this several times resulted in values between .05 and .06). The proportion of times the null hypothesis is rejected increases when you use a bigger sample size. When you use groups of 20.000 this proportion increases to approximately 73.3%.


simulatie

The Crud Factor is described by Meehl (1990) as the phenomenon that ultimately everything correlates to some extent with everything else. This phenomenon was supported by a large exploratory research he conducted with Lykken in 1966, in which 15 variables were cross tabulated with a sample size of 57.000 school children. All 105 cross tabulations were significant, of which 101 were significant with a probability of less than 10^-6.

Similar findings were described by Starbuck (as cited in Andriani, 2011, p. 457), who found that: “choosing two variables at random, a researcher has a 2-to-1 odds of finding a significant correlation on the first try, and 24-to-1 odds of finding a significant correlation within three tries.” Starbuck concludes from these findings (as cited in Andriani, 2011, p. 457): “The main inference I drew from these statistics was that the social sciences are drowning in statistically significant but meaningless noise.”

Taken literally, the null hypothesis is always false. (as cited in Meehl, 1990, p 205). When this phenomenon is combined with the fact that very large samples can make every small effect a significant effect, one has to conclude that with the ideal sample (as large as possible) one have to reject every null hypothesis.
This is problematic because this will turn research into a tautology. Every experiment that has a p-value > .05, will become a type II error since the sample was just not big enough to detect the effect. A solution could be to attach more importance to effect sizes and make them decisive in whether a null hypothesis should be rejected. However it is hard to change the interpretation of the p-value, since its use is deeply ingrained in our field. Altogether I would suggest to leave the p-value behind us and switch over to a less problematic method.

 

Andriani, P. (2011). Complexity and innovation. The SAGE Handbook of Complexity and                          Management, 454-470.
Meehl, P. E. (1990). Why summaries of research on psychological theories are oftenunin-                     terpretable. Psychological Reports, 66, 195-244.

Crisis, what crisis? Revolution!

Barbara Spellman (introduced as Bobbie) teaches evidence and various courses on the intersection of psychology and law at the UVA. Also she is now editor of Perspectives on Psychological Science.
She discussed the crisis of replicability. However for her this is not a crisis but rather a revolution. She talked about some analogies of the failure to replicate, and its impact, to a revolution.
I will talk about the past, the present and the future that Bobbie mentioned. What made the failure to replicate to a revolution?, How to behave in the revolution? and: What to predict for the future?

First I like to start with describing what kind of revolution Bobbie means when she compares the situation in our field to a revolution. The change in our field is similar to the French revolution since our own people (researchers) push the revolution, and they do not want to change or fight “the others” but rather change a “topic” within our own field. (In the French revolution this was the monarchy).

The past:
Replicating studies of other researchers did not just start, we’ve been doing this for a long time already. Also then certain studies would not replicate. Why is this problem only now a revolution? What events pushed the revolution? One of the major happenings in the recent years are the fraud cases (not just in the Netherlands, hooray). These cases opened our eyes on how valid our findings are. But maybe more important is the technological change in the last years. The use of internet in science accelerated the research. One can easily find lots of participants, all articles are within reach (one click on a button and it’s on your computer) and running the analyses is also easier and faster. However this fast science also can lead to sloppy science. What determines which of these findings are real effects? We need to replicate. Additionally internet made that replicating got a real voice. Before the shift to sharing so much information and research on internet, it was hard to get attention with your replication. Now you can spread your information (did it replicate or not) very fast. And all replications add up on the internet making its influence stronger.

The present:
How to behave in this revolution? There are the so called replicators and anti-replicators, and both have a lesson to learn. First of all we should never take science personal. A failure to replicate is no personal rejection. The anti-replicators should not think that replicators are just evil. On the other hand replicators should not act like replication is the only thing. Let’s meet in the middle.

The future:
More young people get a PhD and more older people retire. The younger people are the ones that are used to sharing on the internet. Therefore Bobbie predicts that this revolution will take place and internet will have a prominent role in science. She predicts that ultimately there will be no journals anymore. You take your paper and send it into the “sky”. You have to categorize it (e.g. this paper focusses on cognitive neuro-science) and then online ‘journals’ can spot it and ‘publish’ it online, awarding it with a star or something else that distinguishes this paper from others on the internet.

Let’s see how it works out!

Riet van Bork