Reproduction and replication of research findings can improve the quality and reliability of research. The recent credibility crisis in the field of psychology, has sparked a huge discussion on the reliability of research findings in all fields and critics have expressed strong doubts over replicability of published research. It should be safe to assume that the original data and programming code from a published article would replicate the results presented. However, we regularly find papers mentioning failure to replicate results using original data and code. For instance, Gertler, Galiani and Romero (2018) tried replicating 203 papers with original code and data and found that one out of seven studies replicated perfectly. To mitigate this issue, researchers from all over the world have taken a stand in favor of practicing and promoting transparent research practices.
At 3ie, we use Push Button Replication (PBR) to verify the reproducibility of research findings using the code, data and methodology provided by the original authors. Brown and Wood (2018), point out that the concept of PBR is very simple. The process replicates research findings using original data and code provided by the researchers. In a previous blog, Brown and Wood mention that a necessary condition for reproducibility of findings is that data and code should be available. To facilitate verification through PBRs, we mandate submission of all data and programming code from all 3ie-funded studies. We believe that reproduction of results is an important measure to quality assure products such as briefs, reports and other products that are developed based on research findings. To demonstrate our commitment to openness and transparency, we publish all reports and research materials, including data and code in the public domain.
Reluctance to share replication materials
Several institutions do not have a policy on sharing research and most claim intellectual property over the research they fund, and do not make them public. In our interactions with researchers, we found that at the institutional level there are only a handful of people who are aware and willing to work towards openness in research. Although they agree that sharing research materials is important, they are affiliated with institutions that do not support openness, which makes them reluctant to share materials.
A study looking at 141 economics journals found that only 20 per cent of the journals have a functional data sharing policy. The Gertler study actually looked at over 400 studies out of which only 203 studies had both data and code available for replication purposes. The culture of promoting sharing of replication materials isn’t prevalent among journals, in fact, only a handful of them promote sharing which further nurtures the culture of reluctance among researchers. Journals or funding institutions often allow embargoes on the sharing of data and code until a study is published which in turn fuels the reluctance of researchers to share replication materials. By the time the replication materials are shared the data is outdated and of no use to other researchers.
PBR from different perspectives
Why is PBR necessary from the original researcher perspective?
Knowing that a PBR will be conducted when results are submitted makes the researchers write code and transform data in a way that is easily verifiable. PBRs also help correct minor errors such as typos, misreporting of results and so on. It creates awareness among researchers about good data and code sharing practices. It decreases verification times and increases the reusability quotient of the data and code. Researchers refrain from writing illiterate programs which is key in thinking about the long term usage of the code.
Why is PBR necessary from a fellow researcher perspective?
Fellow researchers and students benefit from the PBRs as it helps build an in-depth understanding of the study methodology. They can also reuse the same materials either for replication purposes or as an addition to their own research.
While conducting PBRs, we have interacted with numerous researchers and a common reason for issues with their code or data is recalling what sort of transformation of data was done to get the desired results. Having proper documentation helps trace the changes made to the data and code. Using research logs and version control tools facilitates this process. Documentation is a part of the PBR process which helps original researchers with recall. It helps replication researchers to track any issues in the research lifecycle and validate results.
Why is PBR necessary from a donor perspective?
Donors often seek tangible proof of the authenticity of the study results that they have funded. They often want a quick turn around on the verification of reports after the completion of the studies, and if the data and code is PBR certified the process of verification by the donor is quicker. This third party verification can also help the donor disseminate the findings with greater confidence.
Why is PBR necessary from a policymaker’s perspective?
Policymakers use evidence generated from studies. Replication of the research findings assures them of the quality and builds trust to use evidence to inform policies that affect people’s lives. Quality assuring the research findings also gives policy-makers the confidence to be accountable to the public, whose money is often used to fund these studies.
Good practices that makes studies replication-ready
The four main pillars that ensure reproducibility, replicability, reusability and long-term preservation of data and code are as follows;
- Submission of raw data
- Submission of codebook and a Readme file
- Programming code for cleaning and preparing the data for analysis
- Programming code for analysis to generate final results
Guidelines such as providing comments, providing codes for user-written commands, proper folder structure, etc., while analyzing the data and writing code, helps organise the data, code and findings. Furthermore, clear documentation of all the processes involved in the data cleaning and analysis by adding comments in the programming code files helps in understanding the rationale behind every line of code. For all 3ie-funded studies, researchers are given guidance to make their studies replication-ready.
With the new age tools for transparent workflow, like Open Science Framework, Git, R and Stata dynamic documents, it has become easier and more efficient to keep track of all research processes throughout the life cycle of a study. A number of data repositories such as figshare, dryad, etc. to name a few, have their own data curation and submission guidelines which help archive and version control data and code. All these instruments help replication of studies in turn promoting transparency of research practices.
The credibility and ongoing replication crisis is a good wake-up call for researchers to take note and implement practices that would quality assure their research and help in long-term usage of their data and replication materials.
With inputs from Neeta Goel, Marie Gaarder and Radhika Menon