Fifteen years ago, the Evaluation Gap Working Group published a report noting the absence of solid evidence on the effectiveness of development programming. This report, and the intellectual community behind it, drove a wave of work on impact evaluations in the development sector. So, did we learn? To get views from two different vantage points in today's evaluation community, we asked our Executive Director Marie Gaarder, who was part of that initial intellectual community, and Research Associate Etienne Lwamba, who had not yet started high school at the time.

What was the state of the evaluation field when you first started?

Marie Gaarder: In 2006, very few people in the development field had heard about impact evaluations as a useful tool to inform development policy. When 3ie was launched, there were hardly any impact evaluations done or published and the idea was that any robust evidence would be useful. There was no need to focus on gaps, as the development field was simply a huge black hole.

Then came the impact evaluation revolution with universities like Berkeley and organizations like 3ie and J-PAL saying that it is good to turn the focus to outcomes the way the MDGs did, but without impact evaluations you have no idea whether your programs and policies contributed to these or indeed hampered their achievement: for that, a robust counterfactual was needed!

While 3ie promoted a range of evaluation methods, including non-experimental and quasi-experimental in addition to RCTs, and were advocating for the use of theory of change and mixed-methods, we were quickly and misleadingly labeled ‘randomistas’. There were lots of controversies and push-back. In the first-ever 3ie conference in Cairo, with over 700 participants, the protests and discussions were heated: impact evaluations were a northern, context-blind, naïve, unethical and simplistic approach to evaluating what were in fact complex realities.

Etienne Lwamba: I didn’t know about these controversies. I have to admit I was pretty far away from that world. I would tend to think now it is a bit of the reverse challenge: there are so many impact evaluations, you can get lost in the constellations. There are so many ways to measure impact. But what I also note is that not all impact evaluations are created equal! If you take a few minutes to think about it: how can you attribute the success or failure of an intervention? How can you measure success? How can you interpret quantitative and qualitative data in a meaningful way? I think the challenge today is not so much the number of impact evaluations but their rigor and their accessibility.

What are the challenges that came with this increase in evaluation evidence (at least in some sectors and geographies)?

Marie Gaarder: As the evidence field exploded, and global decisions were (at risk of) being informed by cherry-picking those studies whose results you knew about or liked better, 3ie took the lead in creating a systematic review group within the Campbell Collaboration. Systematic reviews ensured a systematic and unbiased process for summarizing the full body of evidence on a specific topic. In 2010, the International Development Coordination Group, IDCG, was created with 3ie hosting the secretariat and becoming a lead producer of systematic reviews in the development field. Another challenge with the increasing production of evaluation evidence was that of trying to get an overview quickly of the evidence in a certain sector and also of identifying where more research was needed. I led the team that developed the first-ever Evidence Gap Map (EGM) in 2010 in an effort to respond to exactly those two questions  on the topic of agricultural interventions and nutrition. We manually entered the data in an excel sheet – there were no fancy interactive bubbles and links at the time. Little did we know at the time what a hit this new tool would become!

Etienne Lwamba: As a Research Associate working on systematic reviews, I can see the relative weakness of the standardisation of practices for impact evaluation and reporting. There are probably as many ways to report results of impact evaluation as there are evaluators. At the end of the day, what we want is our results to make sense to a wide audience and our work to be used by others to bring more meaningful results. An effort needs to happen in international development impact evaluations and knowledge products to standardise the practices and reporting methods, otherwise it is difficult to summarize the body of evidence in any meaningful way.

What are some of the improvements you have seen since you first started?

Marie Gaarder: There is now much more serious attention to use and policy take-up, and there is more serious research done to understand the processes that lead to policy impact from the studies. 3ie’s contribution tracing work and evidence impact portal are prime examples. Also, our policy helpdesk developed during the COVID-period have enabled us to rapidly respond to policy questions drawing upon relevant existing evidence and summarizing it in an action-oriented way.

What is most promising is the increase in researchers from L&MICs who are leading on IEs, as well as the spread of professional evaluation courses and educational offers. There is less controversy in part because the evaluation field has grown up and are increasingly using the best available method to respond to the crucial questions.

Etienne Lwamba: Decision-makers do not only need impact evaluations to exist, they need access to them. Impact evaluations are there, but do people always know where to find them? One thing I like about 3ie is our commitment to making impact evaluations available through platforms like the Development Evidence Portal as well as evidence products like Systematic Reviews and Evidence Gap Maps. Impact evaluations are already suffering from a relative lack of consideration in international development, so we should make the effort to facilitate their access. Beyond finding them, are the impact evaluations intellectually accessible? If we don’t make our scientific products intellectually accessible for implementers, how do we expect them to take our recommendations into account? If we want to be listened to, we need to talk to them with language they understand. Development Evidence Portal as well as evidence products like Systematic Reviews and Evidence Gap Maps. Impact evaluations are already suffering from a relative lack of consideration in international development, so we should make the effort to facilitate their access. Beyond finding them, are the impact evaluations intellectually accessible? If we don’t make our scientific products intellectually accessible for implementers, how do we expect them to take our recommendations into account? If we want to be listened to, we need to talk to them with language they understand.

What more do we have to learn? Or in other words, what work is still left to be done?

Marie Gaarder: The impact evaluation field has in many respects ‘grown up’. Most researchers are more aware of the need to rely on mixed-methods and understand context and implementation issues, as well as the need to engage with the main stakeholders during the process, but the lack of focus on the cost-analysis and on gender are still mind-boggling.

There has been very little improvements in terms of understanding unintended consequences and hardly any serious efforts to combine impact evaluation estimates with larger modelling approaches. There is still very little understanding among development practitioners and even among evaluation experts of the importance of systematic reviews to inform larger policy decisions (we still see a lot of vote counting approaches which is quite surprising for people who otherwise understand statistics).

The main worrying trend is that funders are less generous in their support to public goods. As Etienne said earlier, platforms like 3ie’s Development Evidence Portal are crucial if we want to ensure high-quality evidence is accessible and freely available for all.

Etienne Lwamba: Even now, all evaluation work represents a tiny fraction of development funding. When I started my career in international development three years ago, I worked on the implementation side. I saw evaluation professionals as nice guys, but I tended to think “if we all do that, who will actually help those in need? Do reports and observations save lives?” We tend to budget for all the activities of project implementation. Then all of monitoring, evaluation, and learning gets what is left, and sometimes less. From my perspective I think it is a mistake. How much do our design miscalculations and interventions with no real impact cost to international development? Probably way more than all the evaluation budgets combined. So, what do we prefer: investing more in learning to make fewer mistakes, or keep making costly mistakes? I have made my choice.

Comments

Sandhya Sangai 30 June 2021

Research especially different form of programme evaluation, has a great potential to develop and implement policies which might yield much better results. I think what is important is to win the confidence of policy makers and get some opportunities to show it on small population. this may help to gain the support from policy developers and funders.

Promise Nduku 08 July 2021

Interesting to learn of the controversies of the past and the significant milestones achieved so far.

Jindra Cekan 13 July 2021

Dear Marie & Etienne-
Yes to “ what do we prefer: investing more in learning to make fewer mistakes, or keep making costly mistakes? ”! Absolutely we need yo look at what design & interventions are most effective. My hope is one day we will evolve to ask about sustained impact. See our work on too rarely evaluated top-of-log frame impact: https://www.betterevaluation.org/en/blog/SEIE & at www.ValuingVoices.com

Ross 29 July 2021

Where could I find a list of the rigorous quantitative assessments of voluntary sustainability standards on the livelihoods of small farm holders in the global South?
I'm specifically looking for studies that used PSM with DID and RCT studies. It seems these only started around 2012 and it's difficult to find cases, even when combing the meta reviews (Oya et al. 2018) details this information but that only goes up to sometime in 2016. Do you have this sort of info, or know where I can locate it?

Leave a comment

Enter alphabetic characters only.