Not all ‘systematic’ reviews are created equal10 March 2015
In a recent World Bank blog based on a paper, David Evans and Anna Popova argue that systematic reviews may not be reliable as an approach to synthesis of empirical literature. They reach this conclusion after analysing six reviews assessing the effects of a range of education interventions on learning outcomes. The main finding of their analysis: While all these reviews focus on the effects of learning outcomes based on evidence from impact evaluations, there is a large degree of divergence in the studies included in each review, and consequently the conclusions they reach.
We agree with many of the points Evans and Popova make, but not with their conclusion that systematic reviews should be taken with a grain of salt. Instead, we believe their exercise actually strengthens the case for doing more high-quality systematic reviews.
Systematic reviews aim to address a number of well-known problems with academic research and less systematic approaches to evidence synthesis. This includes publication bias, unconscious bias from the reviewer and variable quality of research output. They also overcome the limitations of single studies that may be sample, time and context-specific or illuminate only one aspect of a policy issue.
Systematic review methodology and guidelines are based on empirical research and have been developed to address these issues. While there are different definitions and methodological guidelines, key features of systematic reviews are the use of systematic and transparent methods to identify and appraise relevant studies, and to extract and analyse data from the studies that are included in the review. Not all systematic reviews include a meta-analysis. But when it is feasible, meta-analysis is considered the most rigorous method of synthesis.
By gathering data on the totality of evidence on a question, they are an invaluable research tool for establishing the overall balance of evidence on a particular question, separating higher quality from lower quality evidence and the generalisable from the context-specific. High-quality systematic reviews that identify the best available rigorous evidence on a particular issue can be a goldmine for policymakers.
So, the question of how definitive are these reviews? could be usefully prefaced with how systematic are these reviews? Most of the reviews included in Evans and Popova’s analysis are not actually systematic reviews, as per the definitions of systematic reviews commonly used, including by major review producing organisations like the Cochrane Collaboration, the Campbell Collaboration, the Collaboration for Environmental Evidence and 3ie.
Rather, the reviews analysed by Evans and Popova are a mix of literature reviews, meta-analyses and systematic reviews. Most of them do not undertake systematic and comprehensive selection, appraisal and synthesis of the evidence. For instance, while some of these reviews document their search, Conn (2014) appears to be the only review that includes a comprehensive search of the published and unpublished literature for studies of the effect of education interventions on learning outcomes in Sub-Saharan Africa.
This is not a critique of these reviews – they offer valid and useful contributions to the evidence base. But most of these studies are not systematic reviews and should not be judged as such. Nor should systematic reviews in general be judged by the limitations of these reviews.
Having said this, even if these reviews had taken a systematic approach, we would still expect them to differ in terms of the studies they include. They have different purposes, different geographic foci and focus on different levels of education and outcomes. Authors make decisions about their inclusion criteria based on review scope and purpose. The included studies will reflect this, and the important point is to be explicit about the choices made so that readers can interpret the findings of a review accordingly.
Mapping and assessing the quality of existing systematic reviews in education
So what is the quality of systematic reviews in education? As part of a large scale systematic review of the effects of primary and secondary education interventions in low- and middle-income countries (L&MICs), we mapped existing systematic reviews in the sector to answer this question. Our recently completed education evidence gap map identified 20 completed or ongoing systematic reviews assessing different education interventions in L&MICs.
The 3ie education evidence gap map shows that there are a number of well-conducted systematic reviews in the sector. But it also highlights that many of the existing systematic reviews have weaknesses that reduce our confidence in their findings. Based on a careful appraisal of the methods applied in the reviews (using a standardised checklist), eight of these reviews have been given an overall rating of ‘low confidence in conclusions about effects’.
The main weakness identified in these reviews is a lack of critical appraisal of included studies. They do not appraise the risk of bias of included studies. Hence the reliability of the findings is not clear to readers. Secondly, several of these reviews have limitations in their searches. Finally, some of the reviews use vote counting when meta-analysis is feasible and have issues with the interpretation of findings. What we take away from all this again is that we need more high-quality systematic reviews.
Improving the practice of evidence synthesis
Evans and Popova raise some important points about ways that systematic review practice can be improved. Specifically, they call for more comprehensive searching, the combination of meta-analysis with narrative review and clearer thinking about intervention categorisation. We agree with these suggestions and, as it so happens, are working on addressing them in our ongoing review of the effects of a range of education programmes on enrolment, attendance and learning outcomes in L&MICs.
A related issue highlighted by Evans and Popova is the combination of highly divergent studies in a meta-analysis. In a high-quality systematic review, this would be addressed by conducting thorough heterogeneity and sensitivity analyses. Heterogeneity and sensitivity analysis can help explain some of the variation in outcomes across studies and point to potential explanatory factors that might explain this observed divergence. It can also be used to see if different ways of pooling or splitting interventions in a category can bring out more granular findings about intervention effects.
In our own education systematic review, we are also combining meta-analysis with narrative synthesis and causal chain analysis. To inform this analysis, we are drawing on evidence from a wide range of programme documents, process evaluations and qualitative studies associated with the interventions in the included impact evaluations. We think this is crucial for unpacking sources of heterogeneity and identifying factors associated with improved outcomes. We are trying to find answers not just to the what works question but also the important when and why questions. We expect to publish this full systematic review in late 2015.
It is clear that not all ‘systematic’ reviews are created equal But if done well, systematic reviews provide the most reliable and comprehensive statement about what works. More effort needs to go into improving the quality of systematic reviews so that they become useful and reliable sources of evidence for more effective policies and programmes.