We’ve Got the Tools: Literate programming for open science

This morning, the announcement of the second round of the Many Labs collaboration crashed the website of the Center for Open Science. It is a sign that, despite all woes, at this point in time there exists a sense that something can be done to improve psychological science, and forge a better future for our discipline. There are many fascinating initiatives, but I am perhaps most excited about what features in the Center’s tag line: tools for open science.

Many such tools already exist. Among the most powerful, I believe, is the combination of R, Sweave/knitr, and Git/Github that allows researchers to create accessible and reproducible statistical analyses, share them with others, and collaborate on improvments.

R is a statistical programming language that, as Lukas writes on this blog, allows you to “document literally every step in your preprocessing and statistical analyses”. But merely documenting the code is not enough: it is also important to document the decisions that led to this code, and to explain it to the reader; i.e., to write what is called “literate” (human-readable) code. This looks approximately like this:

The following code implements printing of the phrase “hello world”

<<>>=
print(“hello world”)
@

Easy, isn’t it? Sweave and knitr are two solutions for writing literate code, both of which are implemented in the splendid RStudio IDE. They allow the author to create an R file with code that can be executed, but which can also be compiled to produce a human-readable document that includes text, code, and R output. Knitr, which builds on Sweave, should be your preferred solution today, and supports the creation of LaTeX, html, and Markup files.

A great example of the usefulness of R and knitr is provided by Tim Churches, who re-analysed a meta-analysis on the effectiveness of bicycle helmets (if you, like me, prefer to cycle without, the results may be disappointing). If you click the link, you will find what looks like a regular website, but is a Markup file created using knitr and shared with the world on Github.

That last point is the finishing stone in my argument: once documents have been created following the principles of literate programming, they are ready to be shared, just like a written-up paper would be. Github – which Lukas has introduced on this blog – enables you just that. Built on the version management tool Git, it is a platform on which you can archive your code (or indeed, any) files. Better yet: you can ‘fork’ – copy and manipulate – the documents shared by others. If we believe that flawed analyses should be corrected, this is the easiest way to do so.

So, if you have not already learned R (or another statistical programming language such as Python or Matlab), I highly recommend you do. Once you have, nothing is easier than to start writing literate code – as I said, RStudio supports knitr out of the box. The final step is to start sharing your work on Github. Git is supported natively by RStudio, too – here’s a manual.

2 thoughts on “We’ve Got the Tools: Literate programming for open science

  1. Clear and convincing! Do you know whether there are many researchers currently using this?

  2. Hi Odile! There are definitely a lot of researchers who make use of the first two tools, R and knitr (or alternatives, e.g. Python). Unfortunately, by far not as many publicly share their code, although doing so, be it through GitHub or e.g. through the Open Science Framework is increasingly seen as exemplary practice. I certainly hope that with the recent push to improve data sharing, code sharing will become a natural part of the process.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>