What’s first for replication studies is what’s next for 3ie’s replication programme
Many consider pure replication, where the replication researcher starts with the original data set and writes code to recreate the published results according to the methods described in the publication, to be the second step in replication analysis. So, what is the first?
The first step is checking whether the original authors’ code can be run on the original data to reproduce the published results. At 3ie, we call this push button replication (PBR), as in ‘can you push the button and replicate the published results?’. We started working on concepts and tools for PBR last year, and this summer we are kicking off the 3ie PBR project. For this project, we will conduct PBRs of 122 recently published development impact evaluations to explore both the transparency and the verifiability of new research informing policy and programming for development.
We didn’t focus on PBR before because we did not expect that many studies would face a replication challenge at this step. We learned, however, that others are less sanguine.
The call for PBR
In May 2015, 3ie gathered a group of critics of replication research, supporters and people with an interest in replication research for a one-day consultation event in Washington, DC. To our surprise, one of the more lively discussions at the event centered on whether it is reasonable to expect that the vast majority of published empirical studies can be exactly reproduced. Simply put, is it fair to expect that original data and programming code from an article exist and can be used by a third party to easily reproduce the published results?
All present agreed that this kind of reproduction is the most basic replication question. Some argued that this expectation should be a given – that of course original authors always have the data and code to reproduce their work. Others expressed strong doubts about how frequently authors really can provide the required materials to reproduce the published findings. These doubters argued that replication research should focus, at least initially, on this very first line of verification.
Existing evidence on reproducibility
There is evidence from outside of international development that support the doubters. One well-known example is McCullough, McGeary and Harrison (2006), hereafter MMH, who explored the replicability of papers in the Journal of Money, Credit and Banking. They selected 192 papers published between 1996 and 2002 that use data and code to produce their results. MMH defined replication as the reproduction of the results of a paper and they used the data and code provided by the authors explaining, “We made minor alterations to data and code to try to get the code to run with the data, but we did not attempt major alterations.” This definition is consistent with PBR. They found that only 69 of 186 articles met the data archive requirement of the journal. Out of those 69, only 14 could be replicated using the information from the archive.
A more recent example is Chang and Li’s (2015) working paper, which looked at the replicability of 67 empirical articles in macroeconomics published from 2008 to 2013 (Markus Goldstein provides a good summary here). Chang and Li selected articles both from journals that require data and code for replication purposes and from journals that do not. They were able to obtain these files from the authors for 29 of the 35 papers in journals that require replication files and from 11 of 26 papers in journals that do not (six had confidential data.). For those with data and code, Chang and Li conducted replication studies following the definition of MMH. They labelled a replication as successful when they could qualitatively reproduce the key results of the paper, a criterion that they admit is loose. Overall, they were able to successfully replicate only 29 of the 67 articles.
What is step zero?
As we can see from these examples, there is also a step zero. Before we can even look at whether pushing the button replicates the published results, we need to have the data and the programming button to push. PBR also addresses this research transparency question: will the original authors provide the data and code for their published study? Unfortunately, we know that even when original authors are required to do so, some do not.
Glandon (2010) reports the results of the American Economic Review (AER) data availability compliance project. For this project, Glandon and company reviewed 39 recent AER articles to check their compliance with the journal’s data archive policy. He finds that only 80 per cent of the articles comply with the spirit of the AER policy.
Taken together, the findings from MMH, Chang and Li, and Glandon suggest that data accessibility and basic replicability are a challenge, at least for macroeconomists. Markus Goldstein muses: ‘[it] makes me wonder what the analogous statistics would look like for a set of micro- or even micro-development papers.’
That’s the question the 3ie PBR project will answer, at least for a large sample of recent development impact evaluations. These studies are central to the mission of 3ie, and they are also often highly policy influential. We have selected as our sample all the development impact evaluations published in 2014 in the top ten journals that publish these types of studies.
What are our top ten journals?
We identified the top ten journals by looking for those that published the greatest number of development impact evaluations during the period 2010 through 2012 as catalogued in 3ie’s comprehensive Impact Evaluation Repository. These journals span several disciplines including public health, political science, economics and others. These top ten journals are:
- AIDS and Behavior
- American Economic Journal
- The BMJ
- Economic Development and Cultural Change
- Journal of Development Economics
- Journal of Development Effectiveness
- Plos One
- The Lancet
- Tropical Medicine and International Health
- World Development
Taking all development impact evaluations published by these ten journals in 2014 yields a sample of 122 studies. The 2014 sample allows us to examine whether the recent emphasis on research transparency across several disciplines has translated into improvements in push button replicability – both in terms of the willingness and ability of original authors to provide data and code and in the success of replications where the files are provided.
What are the rules?
A key lesson from the 3ie replication programme has been that replication researchers and original authors, even when both have the best of intentions, often enter the exercise with conflicting preconceptions about what should and will be done as part of a replication study. Communications often go sour very quickly. A PBR could be a step towards addressing this issue as it would be the most straightforward and objective replication exercise. We have already developed a PBR protocol to set out in clear terms what the process should be, what is expected of the replication researchers and the original authors, and how the results of the PBR are rated or classified.
The protocol begins with instructions for how to reach out to original authors and then outlines specific steps for how the data and code should be checked. The protocol includes a PBR report template for documenting the process and results of a PBR. It also provides a typology for classifying the results of a PBR. Finally, it includes instructions for when and how results are made available to the original authors before they become public.
The report for the PBR for each article in our sample will be posted on the Open Science Framework as the studies are conducted. As the last step of the project, we will write a summary study with the aggregate results.
In recent years, several organisations have taken up the banner of research transparency. The Center for Open Science developed and manages the Open Science Framework to ‘increase inclusivity and transparency of research’. The Berkeley Initiative for Transparency in the Social Sciences works to ‘enhance the practices of…social scientists in ways that promote research transparency, reproducibility, and openness’. Evidence in Governance and Politics includes transparency and the publication of data among its principles to which all members must agree. The Institute for New Economic Thinking recently published a call for replication and transparency in economic research. 3ie’s PBR project will measure the early progress against these specifically for the field of development impact evaluation, where individual studies are often highly influential.