“Literature reviews are like sausages… I don’t eat sausages as I don’t know what goes into them.” Dean Karlan said this to an unfortunate researcher at a conference recently. The ‘sausage problem’ puts in a nutshell why at 3ie we favour the scientific approach to evidence synthesis — evidence as encapsulated by the systematic review.
We know that systematic reviews can be a very good accountability exercise in helping answer the question “do we know whether a particular programme is beneficial or harmful?”. So instead of cherry picking our favourable development stories, we collect and synthesise all the rigorous evidence. We also say how reliable we think the evidence is through quality appraisal – which should be conducted by two researchers independently, to a level at least as detailed as would be required by the peer review system of a top quality journal. This helps to answer the question “do we know what we think we know?” and, therefore, whether policymakers can trust the evidence. As Mark Petticrew, professor of public health evaluation at the London School of Hygiene and Tropical Medicine says, if policymakers don’t follow a review’s recommendations (which may well be for legitimate reasons), they should at least explain why.
But, outside medicine, reviews have not been very useful in helping the majority of decision makers – those practitioners involved in the design and implementation of projects, programmes and policies. Doing so in reviews usually requires drawing on a wider body of evidence than impact evaluations.
Fortunately, we now have decades of existing development research to draw on, including that generated through surveys as well as by more in-depth ethnographic and participatory research, to help us answer many questions. As I explain below, we are experimenting with different approaches to incorporating broader evidence into reviews at 3ie. But first, a little myth-busting. A typical systematic review goes from several thousand or so papers identified in the initial search to just a couple of hundred for which a full-text screening review is conducted, and then a dozen or less included effectiveness studies. These figures give the impression that a lot of evidence is being thrown away.
Study search flow for farmer field schools literature
The figure above shows the study search for 3ie’s systematic review of farmer field schools evaluations. The exclusion of the first 28,000 papers is not an issue: reviews cast the net widely to ensure studies are not missed, and so pick up a lot of studies that are irrelevant. The real issue is at the next stage – narrowing down from the 460 full text studies. These studies are relevant evaluations, which generally get excluded on grounds of study design.
Thus, a traditional ‘review of effects’ would have limited the farmer field schools review to just 15 included studies (and if restricted to just RCTs, the review would have returned precisely zero results). The analysis of these 15 studies is important, since it tells us which studies we believe are relevant to high-level policymakers in terms of answering the ‘what works’ question.
The approach we use at 3ie does require that studies without credible designs are excluded from the synthesis of causal effects. In the case of farmer field schools, our analysis indicates that farmer field schools do have an impact on real-life outcomes like yields, revenues and empowerment, at least in the short to medium term. But it also confirms that diffusion from trained farmers to their neighbours doesn’t happen.
But what about the remaining 97 per cent of the potentially relevant literature identified for full text assessment? The farmer field schools community of practice is committed to generating evidence, and we found an additional 119 impact evaluations. We don’t believe these additional evaluations are policy actionable for outcomes relevant to farmers’ quality of life, such as yields and incomes (usually due to problems in assuring comparability of the control group). But many of the studies do support the findings in terms of process outcomes (knowledge and adoption of practices). We can also use these studies to make recommendations for improving evaluation design. They suggest the scale of resources that have been devoted to farmer field schools evaluations, might usefully be re-allocated in future to conducting fewer but more rigorous impact evaluations, particularly those based on a solid counterfactual which assess impacts in the medium to longer term. 3ie is itself funding one such study in China.
However, analysis of the rest of the causal chain requires other types of evidence. And this evidence is thin in impact evaluations. Hence, there is a need to turn to evidence in studies which are usually excluded from reviews at the final search stage.
Thus, we included 25 qualitative evaluations in the review, which have helped us understand the reasons for lack of diffusion found in the quantitative analysis – mainly that the message is too complex for farmers to learn outside of formal training. More generally, the studies identify some of the more common problems in implementation, notably where a top-down ‘transfer of technology’ approach has been implemented for an intervention based on a participatory-transformative theoretical approach. The qualitative studies also helped us to understand better the empowerment aspect of farmer field schools.
Some of the best reviews to-date use mixed methods in this way (see ‘Teenage pregnancy and social disadvantage: systematic review integrating controlled trials and qualitative studies’ by Angela Harden, Ginny Brunton, Adam Fletcher and Ann Oakley). But there are still important policy-relevant questions left unanswered, relating to the scale of implementation and how targeting occurs. So, in the final components of the review, we are providing a global portfolio review of 260 farmer field schools projects, and collecting data from 130 studies reporting on targeting and participation. The latter work is ongoing, but we expect it to provide useful information about important goals of some schools such as ability to reach women farmers. More generally, the analysis should help us understand whether those who have taken part in the impact evaluations are ‘typical’ farmers or not, and therefore how generalisable are our review findings.
Systematic reviewing, done right, has the potential to change the culture of development policymaking and research, as it has already done in other fields. Its main strengths are rigour and transparency, and these principles can be applied to answer a wide range of policy questions. Doing the ‘full’ evidence synthesis which we have undertaken for farmer field schools does require more resources. I encourage those interested to read Birte Snilstveit’s paper on going ‘Beyond Bare Bones’ in systematic reviews for options with different resource implications for effectiveness reviews, and to look out for 3ie’s farmer field schools review for an example of broader evidence synthesis.
I wrote this during the inspirational Evaluation Conclave meeting in Kathmandu in February 2013, where Kultar Singh Siddhu, Director, Sambodhi Research and Communications, presented the following quote from the Buddha as a mantra for evaluation. I think it’s also an apt description of why we should do and use systematic reviews:
“Believe nothing, because you have been told it… do not believe merely out of respect for the teacher. But whatsoever, after due examination and analysis, you find to be kind, conducive to the good, the benefit, the welfare of all beings – that doctrine take as your guide.”