Innovations in Data for Impact Evaluation

The accessibility and use of big data and data science methods such as geospatial analysis, machine learning, and generative AI are increasing rapidly in the field of international development. 3ie continuously assesses the practical value of such novel data sources and methods for evidence-informed decision making, distills the most promising ones into pragmatic approaches for evidence users, and guides the application of these sources and methods to research on the effectiveness of programs and policies.

Innovations in Data for Impact Evaluation

The technology space is always evolving, and the frontier of possibility is constantly being pushed further. However, it was not until recently that all of us could partake in this new technology with relative ease and few resources. From Google housing extensive satellite data on its servers, to powerful machine learning algorithms becoming available for free, to the incredible leaps in generative artificial intelligence (AI) for everyone to test and improve upon, there has been a rapid increase in our ability to incorporate innovative data sources and methods. At 3ie, we focus on increasing effective, appropriate, and ethical use of such big data and data science approaches in evidence generation and synthesis, with an emphasis on building research capacity in low- and middle-income countries.

Geospatial impact evaluation

The growing accessibility of open-source geospatial data through platforms such as Google Earth Engine has allowed researchers to conduct measurement studies validating the use of geospatial data to measure several social science outcomes. Since then, the use of earth observation in impact evaluation has become increasingly common. Compared to more traditional data sources, geospatial data can be easily collected for a wide range of variables at a lower cost, more frequently (e.g., bi-weekly or monthly), with less human error, and in regions where traditional survey methods may not be feasible (e.g., remote and conflict-affected regions). 3ie not only has experience conducting impact evaluations using geospatial data, but has also created guidance material on how to use this data for impact evaluation (e.g., introductory tutorial on the use of geospatial data in python, inventory of remotely sensed proxy variables of common social science indicators), as well as mapped its use in impact evaluation (e.g., big data systematic map, land use change and forestry EGM).

Geospatial impact evaluation of an agricultural intensification program in Niger

Climate change is exacerbating food insecurity around the world, particularly in drought-prone areas that are already highly vulnerable, such as the Sahel region of West Africa. To improve food security and resilience, the government of Niger, with funding from the West African Development Bank (BOAD), implemented a multi-faceted agricultural production intensification program (PIPA/SA). This study used geospatial data and a synthetic difference-in-difference design to measure the impact of this program on agricultural production and desertification. A preliminary descriptive analysis has been published, and initial results were presented in a session at the 2023 GeoField Convening in Rome. (This work is also a part of our Climate Change portfolio)

Remote sensing inventory

Remote sensing is the science of observing, measuring, and collecting data on Earth features remotely. It has emerged as a valuable tool for development researchers, owing to its diverse range of applications. Remotely sensed geospatial data can be incorporated into nearly any evaluation that has a spatial component – either as an outcome or as a covariate. 3ie is developing an inventory of remotely sensed proxy indicators, designed to help development practitioners understand differences between remote sensing proxy indicators, explore their application to social science sectors, and provide them with resources and tools to conduct analyses.

Machine learning and artificial intelligence

Machine learning for heterogeneity analysis

Traditional methods for heterogeneity analysis compare two groups based on pre-selected criteria (e.g., male vs. female) but there are various intersecting aspects of a person’s identity, background, and environment (e.g., parent’s income, education, and attitudes) that can influence how they interact with an intervention, and their outcomes. We often want to know: which (combination) of these characteristics is influencing the outcome more/less?

Causal forest analysis is a machine learning technique designed to address the challenge of estimating treatment effects in situations where treatment effects may vary across different subgroups or contexts. Causal forest analysis uses an ensemble of decision trees to model and estimate heterogeneous treatment effects, making it a powerful tool for identifying and understanding how treatment impacts different individuals or groups within a population. Rather than the researcher deciding which characteristic they think may be important to, and influence treatment effect, causal forest methods let the data indicate which combinations of characteristics result in different treatment effects.

At 3ie, we have been applying this method to a previously completed randomized control trial of a school-based gender attitude change program in Haryana to explore whether there was a disparate impact of the program based on multiple aspects of the participant’s background characteristics.

Read more about this application here.

Large language models

Large Language Models (LLMs), such as Chat-GPT and Google Bard, can have large benefits for the international development community. LLMs assist users in interacting with huge amounts of information, extracting relevant parts of that information as well as generating new text, when required. The capability to leverage these models for impact evaluation, evidence synthesis and internal workflows is becoming indispensable for every organization.

At 3ie, we are keeping abreast of the rapidly evolving field of LLMs and how they can be applied to our work and the work of our evidence partners. To that end, we are testing several different use-cases, weighing the pros and cons of open-source LLMs, and distilling the most effective applications of this technology. Furthermore, we are dedicated to accounting for the privacy and ethical considerations of adopting these models. 

Big data systematic mapGaps exist in terms of access to reliable data to monitor and evaluate the progress of development outcomes and targets such as sustainable development goals (SDGs) and credible evidence to decide on future resource allocation to achieve the targets. Data gaps are particularly significant for the populations and countries where the need for evidence informed policy decisions are perhaps the greatest.

The big data systematic map, funded by the Centre for Excellence for Development Impact and Learning (CEDIL), aims to address this gap in information. In this map we visualize the use of big data to evaluate development outcomes across the world with a special focus on challenging contexts. It identifies and appraises rigorous impact evaluations, systematic reviews and the studies that have innovatively used big data to measure development outcomes.

View big data systematic map

To access the submaps, use the links below:

CEDIL Working paper | Using big data for evaluating development outcomes: a systematic map

CEDIL Working paper brief | Using big data for impact evaluations

CEDIL Blog | Big Data in the time of a pandemic


  • 3ie’s presentation at Geo4dev on geospatial tutorial
  • GeoField Convening: Leveraging Earth Observation for Impact Evaluations of Climate-Sensitive Agriculture, FAO headquarters, Rome, September 12-14, 2023 | 3ie presented the Niger geospatial impact evaluation (Watch recording)
  • 3ie presented at gLOCAL 2023 on the Geospatial Impact Evaluation of Agricultural Intensification Program in Niger | Watch recording

Reach out to us at fkastel[at]3ieimpact[dot]org or dlakhote[at]3ieimpact[dot]org for more information or if you are interested in working with us.