Whose Fall is the Academic Spring?


The past has seen several protests against the controversial business practices of commercial publishing-houses. However none of them had been as influential as the recent Elsevier boycott, a movement, which has later been titled The Academic Spring. It is now almost two years later, as it is in the case of its eponym, The Arab Spring. Recognizing that in many countries of the Arab world the protests have not resulted in the changes, many have hoped for, it seems reasonable to take a look to have a critical look at the outcomes of the Academic uprising.

From unreasonable journal prices, over the publication of fake journals to promote the products of pharmaceutical companies [], the support of the Research Works Act (RWA) however was for many the final nail in the coffin of Elsevier’s integrity []. Elsevier’s opponents however could quickly celebrate their first victory when the RWA was declared dead in late February. The RWA however was only one aspect of the critique and its failure prevented that the situation would turn worse, but it didn’t really affect the status quo. The costofknowledge still continues to gather members, but does not seem to effectively threaten Elsevier’s market-domination. That the problem remains is demonstrated when in April of 2012, the library of Harvard University released a memorandum in which they described increasing difficulties to pay the annual costs for journal subscriptions and conclude that “many large journal publishers have made the scholarly communication environment fiscally unsustainable and academically restrictive.”[]. One year later Greg Martin resigned from his position on the editorial board of Elsevier’s Journal of Number Theory. In the resignation letter he concludes that there have been no observable changes in Elsevier’s business []. In their ‘one year resume’ the initiators of the boycott have been a little bit more optimistic []. While they admit that not much has changed in the general business strategy of Elsevier, the ‘Big Deal’ price negotiations haven’t become more transparent and bundling is still a common practice, they report some minor price drops. However they state, that more importantly, the boycott has raised awareness and increased the support for newer more open business models. What almost all of the critics unites, is a shared hope in open access. The recent Access2Research petition can be seen as a further success of the open access movement, as it convinced the White house to release a memorandum that directs all federal agencies to make all federally-funded research freely available within 12 month after initial publication [].

PLoS One, an open access journal is now by far the largest academic journal in the world [] and open access journals are being founded almost daily. While some of these new journals appear what one would call ‘scam’, they, as well as all the journals with poor quality standards won’t have a long life expectancy. The genie however left the bottle and it is unlikely that the fresh spirit of open access will disappear any time soon.


I have the power! Or do I really?



“Knowledge is power”
Francis Bacon

Have a look at the following results and try to explain for yourself what the reasons might be that the results of three replications appear to be that different.

Maxwell (2004)

Because of the headline you might already suspect that power is probably the issue here. Here are three additional points: (1) The sample size of all studies is n=100, (2) all predictors share a medium correlation (r = .30) with the dependent variable and with each other and (3) G*Power indicates a Post hoc power .87. Does this mean that you were totally wrong? The answer is no.

If you find it difficult to explain the deviant results you are no exception. This table is taken from a paper by Maxwell (2004) where he demonstrates how many psychology experiments have a lack of power. How could this be a lack of power if G*Power indicates a statistical power of .87? Well, G*Power does not distinguish, as well as most of us, between the power to find at least one significant predictor and the power to find any specific predictor. Maxwell conducted several simulations and found that the power to find any single specific effect in a multiple regression (n=100) with five predictors is .26 and the chance that all five predictors turn out significant is less than .01. Considering this, it is much easier to explain the unstable pattern of results. One might say that a multiple regression with five predictors is an extreme example but even a 2 x 2 ANOVA with medium effect sizes and n=40 per cell only finds all true effects (two main and one interaction effect) with a chance of 69%.

This is only another powerful example how significance tests can be very misguiding. We have to be aware that this method, which is the common test paradigm in Psychological research, can be flawed and has to be evaluated with caution (Wagenmakers, 2007). Evaluating Confidence Intervals for example, can be one way to realize the uncertainty that underlies Frequentist hypothesis. The following table shows the confidence intervals of the five predictors in the three replications and clarifies that the results do not differ as much as the p-values indicate.

figure 1

The most important lessons that can be taken from this demonstration are: (1) don’t be fooled by p-values, (2) consider the confidence intervals, (3) be aware of the uncertainty of results, (4) do not let your theory be dismissed by a Type II error and (5) publication bias as well as underpowered studies might lead to a distorted body of literature and to be safe one should assume that there is an overestimation of effect sizes. Consider these five lessons when you plan your experiment because a lack of power can turn the results of an otherwise excellent experiment into useless data. While it might be disturbing that a 2 x 2 ANOVA needs more than 160 participants to exceed a power of 0.69 to find all medium sized effects, one should be aware that sample size is not the only way to increase the power of an experiment. Reducing variance with covariates and increasing the effect size by strengthening the manipulation are very effective and might often be more feasible than having more and more participants.

Boris & Alex

Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: causes, consequences, and remedies. Psychological Methods, 9, 147.

Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779-804.

Humans studying Humans

We all know the prototype of the ideal scientist – smart and rational, honest, open minded and extremely critical towards others and his own findings. The world isn’t an ideal place though and one might wonder how far this image represents reality. The infamous Dutch Psychologist Diederiek Stapel reported his impression in his Biography ‘Derailment’; p.258 (translated from the Dutch original ‘Ontspooring’): Researchers defend their interest and research fields, their insights and theories with tooth and claw against other researchers. They are all in competition to produce as much knowledge as possible as quickly and as cheap as possible, and they try to reach their goal with every possible method. […] science just a business and scientists are just people too” – This seems to be in sharp contrast with the above mentioned ideal. One might smell a little bit of frustration and some odor of excuse in Stapel’s statement, however there might also be some truth.

Bakker and Wicherts (2011) for example recalculated statistical results and found that 18 % of the statistical results in research papers reported wrong p-values.

The majority of these mistakes altered a p-value in a way that it turned an otherwise insignificant result into a significant one. Most of the time this changed the results only marginally and most researcher might agree that a study which slightly fails to reach the rather arbitrary significance level α .5 does in no means provide less valuable information than a slightly significant one. However we all seem to have agreed to play this game and as Giner-Sorolla (2012) critically remarked “Psychologists must master the art of presenting perfect-looking results just to survive in the profession” and unfortunately it is still most journal’s policy to prefer aesthetically pleasing ‘p < .05  results’. This might be unfortunate and one can of course blame the journals for acting irrational but it seems that the threshold of revelation might be passed. These days an increasing amount of journals start to change, away from a sensation seeking result-oriented policy, to an approach that focuses on methodological rigorousness and pre-registration with publication guarantee.

Nevertheless, while nobody is perfect, I believe honesty should be most important virtue of a scientist and I assume that most of the researchers, if not desperately with the back to the wall, hold this virtue high. Thus one should focus on another source for these statistical irregularities, one in which I totally agree to Diederiek’s above mentioned statement. “Scientists are just people too” and people make mistakes. Ironically, Psychologist might often be victims of just the effects they are studying. Hindsight bias, or the “I’ve known it all along” effect, that might misguide one to be to uncritical towards a highly unlikely result, because it might not seem much more likely after the data are obtained. Expectancy effect and confirmation bias, might lead one to be overly critical towards results that disconfirm one’s hypothesis and blind of one’s mistakes, if it gets confirmed. Effects like these can be crucial, especially when working in a field that suffers from underpowered research studies. Most studies simply don’t have enough participants and many researchers seem to be unaware of the unreliability that results from too small sample sizes. If researchers in Psychology start to use bigger samples, their results might be less ambiguous, misguiding biases, outliers and rounding errors in the analysis will have less effect on the outcome and if the power is high enough, even null findings can be of value.


Alexander Gierholz