At the recently concluded What Works Global Summit (WWGS) which 3ie co-sponsored, a significant number of the sessions featured presentations on new impact evaluations and systematic reviews. WWGS was a perfect opportunity to learn lessons about the demand for and supply of high-quality evidence for decision-making because it brought together a diverse set of stakeholders. There were donors, knowledge intermediaries, policymakers, programme managers, researchers and service providers. They came from both developed as well as developing countries. Over five days, this amazing diversity of voices shared expertise and experiences.
I was particularly interested in hearing perspectives about the demand for impact evaluations. In this blog, I focus my observations on the evidence from impact evaluations that use a counterfactual and that are from developing countries, since those tended to be the sessions I attended.
3ie’s recent update of its impact evaluation repository shows that the number of impact evaluations using a counterfactual continues to rise each year. But there are worrying trends. The geographic inequalities in the production of impact evaluations persist – there are still hardly any studies in West Africa and the Middle East and North Africa regions. The preponderance of studies are in four sectors: health, education, social protection and agriculture. At the same time, we also found that the growth in the number of published impact evaluations seems to have peaked in 2012. If this is a trend (and it’s too early to tell if it is), we cannot be optimistic about the ability to address these evidence gaps unless there is another impetus to the demand for impact evaluations. What is the prognosis for that?
From all the conversations at WWGS that I heard, it seemed like there are at least three factors that may be constraining the demand for new impact evaluations: the difficulty in understanding and applying lessons across contexts, their ‘high price’, and the different incentives for commissioning new impact evaluations. The opinions were clearly divided, but I think it’s an important debate worth capturing here.
Application of lessons from impact evaluations across contexts
A product that is easy to use usually attracts more consumers. The methods used in randomised evaluations in development are derived from clinical trials in medicine. The application of findings from these trials, which are long in duration and multi-phased, is fairly straightforward. Medicines and health technologies that get approved are presumed to be safe and to work.
But this is certainly not true for development policies and programmes. Here, context matters a great deal. What works in one setting does not necessarily work in another. 3ie’s systematic review on education effectiveness, which was launched at WWGS, demonstrates this in a remarkably clear way. What works to improve school enrolment and learning in one context may not necessarily work in another. The variability in the effects of several education programmes is significant. So, there is a lot more that can be learned by looking at the factors and conditions that influenced the diversity in effects rather than from overall average effect sizes.
WWGS also featured sessions which showed that findings of impact evaluations, particularly those that delved into why programmes work or don’t work, have informed improvements in programmatic design. The strong engagement between research teams and implementing agencies have led to interim and final findings being fed back to implementers for making urgent and sustained fixes to programmes and policies.
‘High price’ of randomised impact evaluations
As with commodities, the demand for randomised impact evaluations is affected by economic factors, such as the cost of evaluation and the wherewithal of the demander. Randomised impact evaluations are costly in terms of time and money. An impact evaluation takes a minimum of two years – from the time of beginning implementation, when a question is posed and a baseline survey is carried out, to completing the endline observations. The average duration of 3ie-supported studies is three to four years. There were several debates at WWGS about the usefulness of results after such a long gap. Some questioned it while others just saw it as the nature of long-term development. Nevertheless, there was consensus that those who demand such evidence should take the ‘long view’ of development, even if it is at odds with the myopic tendencies of political economy.
What about the cost of impact evaluations in terms of money? While there is large variance, the average cost of an experimental or quasi-experimental impact evaluation that addresses the counterfactual is US$400,000 for 3ie-supported studies. This is not an insignificant sum, especially if it is just a pilot intervention that is being tested. Several discussions pointed to the trade-off between funds for studies versus those for actual programme implementation. It is true that most of the impact evaluations in developing countries, especially those in the poorest ones, are funded by donors. But we are still left with a larger ethical question. Can we really afford to spend millions of dollars on programmes that don’t work?
WWGS featured several useful examples of impact evaluations where the research teams had found ways to lower costs. For instance, technology is now being used to gather data more efficiently than household surveys. So, while costs are a factor, they cannot be considered a barrier. We need to find innovative ways and means to reduce study costs without hampering quality.
Different incentives for carrying out impact evaluations
Without subsidies from international donors, there would always be fewer impact evaluations done than would be socially desirable. As with most research, the benefits from the knowledge gained can go, not only to those whose programme is being evaluated, but also to other stakeholders. This can happen if there is sufficient engagement between researchers and stakeholders, and if knowledge from doing impact evaluations is translated for the use of different audiences. Given the cost of impact evaluations, poorer countries would have fewer studies, even though their demand for evidence is no less than that of others. But the issue with such funding is that donors then require impact evaluations to be carried out as much for accountability as for learning. Taxpayers in donor countries are increasingly asking: what’s in it for us? A rigorous impact evaluation is an imperfect instrument for accountability given its cost and timeline. But they do offer valuable lessons on what works and what doesn’t, which in the end can save taxpayers’ money.
While these constraints to the demand for randomised impact evaluations are of concern, I came away feeling optimistic. WWGS offered a large-scale forum (863 attendees!) for a public conversation with multiple stakeholders. There are challenges ahead, but there is still the feeling that we need to come together to address them. It is far from clear if the donors attending WWGS were convinced that the funds for impact evaluations need to be sustained. However, the fact that they were there in full force and participated actively in the discussions is certainly a hopeful sign.
What Works Global Summit (WWGS 2016) was held in London from 26-28 September 2016 (along with pre-conference workshops from 24-25 September). The conference was organised by the Campbell Collaboration, 3ie, Sense about Science and Queen’s University, Belfast. It was sponsored by the Bill & Melinda Gates Foundation, EPPI-Centre, Centre for Evidence and Implementation, the Inter-American Development Bank and PEW Charitable Trusts. 3ie staff were involved in organising or participating in 8 workshops and 25 sessions at the summit.