The unlucky number seven: A rather painful critique of my internship project

 

For my internship project I was investigating the effects of deep brain stimulation (DBS) as a treatment for Parkinson’s disease. This included improvements in motor and quality of life, but also cognitive decline. I ended up dealing with a rather large and complex data set, which after all the computed variables had been made contained 233 variables for 281 participants. To make things more complex the data had been combined from three different sources which meant there were inconsistencies with coding and administered tests in the studies. I tried to be as stringent as possible with the initial data checking, handling and following analyses. However, I was still left feeling unconfident with my findings. Unfortunately, I had good reason for this uneasy feeling. Whilst checking I found two rather huge mistakes. Fortunately I still had time to improve my project before the final was handed in.

The Suspect P-Value

I cleaned up the final version of my data set and syntax and decided to rerun all the analyses and make sure I had consistent results. It was all going rather well until I moved onto the cognitive variables. In my write up I had already found a wrongly reported P-value for the Mattis Dementia Rating Scale (MDRS). I had reported it as significant, when in fact the p value was 0.017, insignificant for my alpha of 0.01. When I reran the analysis it turned out to be even worse, the P-value was actually 0.022. I knew I had double checked the analysis, so was rather baffled! I found the earlier version of data set which gave me the original 0.017 P-value and began my search. The data seemed identical, until I checked my IQ covariate. I had missed a missing value coding of 777 (inability to complete). The IQ covariate which I had used in all my cognitive analyses! I reran all my analyses, and changed my report. Mostly my analyses were not largely affected by this mistake, however I did lose significance on one comparison which went from P=.009 to P=.025!

The Questionable Predictor

I also had to question my inclusion of the MDRS as a significant baseline predictor of cognitive decline following deep brain stimulation. The MDRS was a very desirable predictor; it was measure of global cognition that was already in use by the neuropsychologists and doctors to set a cut off point for undergoing DBS. Originally significant, once made into a T-Score to correct for age and education it was pushed out by years of education and IQ. As IQ and education can be seen to conceptually overlap, I had made one model excluding education. Once again the MDRS was significant and IQ was no longer in the model. I had kept this model as it was more parsimonious and practical, as the IQ measure would not be as available as the MDRS. However, I was feeling a little uncertain about my decision. Once I had rerun my analysis with the corrected IQ I found that the original model was now medication dose and education. I decided to retain this model, which can be seen as more consistent and improved research practice.

I feel my mistakes were a combination of long research hours with a large data set and previous expectations clouding my judgement. Overall I learnt a lot from doing this critical review of my own work. It has made my very aware of my own fallibility, despite having good intentions. I am glad that I have managed to locate and fix these problems both for this project, and to help me improve my research practice for future projects.

Fraud and Scientific Integrity: The good scientist, bad scientist continuum

There are many problems with how research is currently conducted. These range from human error through questionable research practice to full blown fraud. Fraud includes fabrication, falsification and plagiarism. Unfortunately it is much more prevalent than the research community like to believe. There have been many cases demonstrating that even obvious fraud go unchallenged for years. A famous example is of Diederik Stapel who fabricated data in around 50 papers and went undetected until some colleagues noticed and reported him. Unfortunately, even after retraction of this work he continues to be cited. In some cases fraud is hard to detect, however sometimes it is obvious as the results that are “just too good to be true”, unrealistic amounts of publications or results that too consistent across studies. A good example is Ruggiero and Taylor (1997) whose findings were so consistent despite small sample sizes that the chance of them occurring was 1 out of 3.8 billion. In other cases the only way to detect fraud is if fellow researchers of students notice and report the misconduct. However, there are large social or practical consequences for the complainant (i.e. isolation or
Questionable research practices, although not as severe as full blown fraud, still set major challenges to the integrity of scientific research. These types of questionable practices include misreporting of P-Values (Bakker & Micherts, 2011), sequential testing and accumulative sampling until significance is reached. This is even harder to detect than fraud, mostly because researchers that use questionable practices do not wish to share their data. Unsurprisingly, willingness to share data is related to the strength of the evidence, quality of reporting and fewer errors (Wicherts, Bakker & Molenear, 2011). Some researchers, despite signing a document which commits them to sharing raw data, still refuse. Some give excuses such as computer malfunction, others assert they no longer have copies or simply promise the data and do not provide it. A nice illustration is the response “I’ll send you the data within a few days” from Wicherts et al (2006). The researchers have now been waiting 2663 days for the data, which assumedly will not appear. retraction of publications), and stigmatization of the university if the case goes public. However, results from falsified or fabricated data are useless to science and potentially harmful.

Presently research is conducted by an individual who not only believes in their hypothesis, but also has invested interest (i.e. publication) in obtaining results. This combined by the privacy of raw data and analysis provides a huge problem for research integrity. Recent research on human decision making has found that secrecy, expectation and r

eward are a lethal combination and can lead to more fraudulent reporting (Shalvi et al, 2011). Human nature cannot be changed, researchers will still believe in their hypothesis. The research process will still offer rewards for significant findings. However, privacy can be changed.

Therefore, the main message of today’s lecture is transparency. By sharing our data and preregistering studies and protocols it will ensure good research practice, but also protects against research misconduct. Replication can be easily achieved and it will make errors less prevalent. Moreover, it will promote honesty in the analysis and good record keeping, increasing the integrity of the scientific field.

Lecture by Jelte Wicherts