You’ve probably heard or read about generative artificial intelligence (Gen AI) and its revolutionary potential. Having productive conversations with a chatbot, something that was painful (at best) until a few years ago, has now become easier because of Gen AI models that can generate coherent, meaningful output in a matter of seconds. Using applications based on such models (e.g., ChatGPT, Gemini, or Copilot), we can summarize information from several academic papers at once, learn a new coding language in a matter of days, brainstorm ideas for a blog, or even develop conference plans and itineraries. These are just a few of the current capabilities, not accounting for continual advancements. The latest iteration of Gen AI chatbots (e.g., GPT-4o) has made conversations feel even more effortless, with impressive capabilities related to not only generating text but also integrating text, sound, and images/video into a conversation (in multiple languages).
Despite its promise, we have noticed there are still quite a few misconceptions about what this technology can and cannot do. The Data Innovations Group at 3ie, dedicated to assessing and guiding the application of AI and big data in our impact evaluation and synthesis work, has gathered insights from organization-wide trials and from Gen AI experts on how to use this technology effectively. In this blog, we aim to address a few misconceptions about Gen AI by sharing what we know so far about its capabilities and considerations for implementation, particularly in relation to development research and evaluation work.
Misconception 1: AI, Machine Learning, Generative AI, LLMs do the same thing
While we often hear this terminology being used interchangeably (and the concepts are often not clearly differentiated), the latest iteration of Gen AI technology is different from classification-based machine learning models (aka predictive AI) – as it focuses more on generating new content than exact prediction/classification. In other words, Gen AI models are built to construct a probabilistic response to nearly anything, from “is there intelligent life elsewhere in the universe?” to “teach me how to ride a bike”.
Since Generative AI is an umbrella term, it is useful to distinguish models based on their modality, e.g. text generator vs image generator. For this purpose, we use the term Large Language Models (LLMs) to refer to Gen AI models that generate text and, as of now, are the most widely used. With the advent of “multi-modal” Gen AI models, the landscape of this terminology will continue to change quickly.
Misconception 2: LLMs are not useful to me because they make up information and can’t give me facts
A better understanding of the capabilities and intended usage of LLMs is essential to overcome this view. It is true that LLMs do not have the ability to verify facts, nor do they know what is true or false. They rely on a corpus of past data to probabilistically generate new text. By nature, they are meant to take instructions from a user to guide their text generation, i.e. they do not look for “truth” but create coherent new sentences to assist a user in problem-solving and creative thinking. Recent developments in Gen AI-based applications do mitigate some of these limitations. For example, plug-ins can now run code, allowing users to verify generated code directly within the chat interface. They can also access the web to improve the accuracy and relevance of responses or execute tasks on other platforms without leaving the chat.
It is human nature to apply a familiar lens to a novel technology, such as using LLMs like a truth-seeking search engine. Instead, we should try to imagine a new lens so that we can solve new problems (or revisit problems that we couldn’t solve before). For example, we are trialing the use of LLMs to help learn a new coding language, brainstorm new ideas, help with writer’s block, and so on. As we imagine new use cases at 3ie, we continue to test the assistance of LLMs in our existing work. As a part of that, we use a “human-in-the-loop" system to ensure all AI-generated content is reviewed and refined by subject-matter experts for accuracy and cited appropriately.
Misconception 3: Gen AI is not useful because it can’t produce high-quality outputs and automate workflows
Gen AI may not be the right tool to improve an entire workflow, but it can be extremely effective in enhancing specific components of that workflow. Carefully assessing the relevance of Gen AI tools for each component independently helps mitigate the risk of an all-or-nothing approach. Systematic trialing of prompting techniques and documentation of their effectiveness is key to integrating Gen AI into a workflow in the long run.
For example, when we trialed using LLMs to assist us in summarizing impact evaluations, they performed extremely well in providing narrative summaries but only with high-quality inputs, carefully crafted prompts (i.e., good instructions) and an iterative, step-by-step process–as one would provide to a team of colleagues for them to generate high-quality outputs.
Misconception 4: I expect Gen AI to replicate what I or my colleagues can do
Here's the thing: AI can't completely replace human expertise and creativity, and that's not really its purpose. Instead, we can think of Gen AI as a tool that complements our skills. It can handle repetitive tasks, sort through massive data sets, or even spark new ideas, freeing our time to focus on the parts of our work that need the human brain. So, the measure of Gen AI’s utility should not be in whether it can replicate your work but whether and how it can improve the scope, scale, and/or efficiency of your work.
We discovered through our trials that it can even help us – somewhat counterintuitively – be more democratic and empathic. We used an LLM to assist in brainstorming an organization-wide policy and crafted prompts that instructed the LLM to take on multiple personas, approach the problem from multiple vantage points, and encourage debate among the personas – the type of debate that might not happen, or to the same extent, if those various perspectives were not all “at the table” when developing the policy. While such AI personas can never stand in for actual staff members, they can help anticipate and address some key points of feedback, even in the first draft of the policy that is circulated for comment.
Misconception 5: I don’t know coding or software development; Gen AI is not for me
Being able to access AI through natural language rather than coding is a novel opportunity for all of us to participate in testing, improving, customizing this technology in a way that could solve some of our key problems in international development. Gen AI use cases can broadly be categorized into two types: enhancing existing workflows and establishing entirely new workflows that were previously infeasible. Imagining use cases for the second category is a challenge that staff members who are not Gen AI experts should embrace since their practical experience and institutional knowledge can add tremendous value in this regard. It is critical for a broad range of staff roles to be Gen AI-literate, including leadership and subject-matter experts.
Looking ahead, 3ie continues to trial applications of Gen AI (and AI more broadly), especially as the technology continually evolves and new curated applications are released. If your organization is interested in learning more about Gen AI and discussing use cases, please reach out to us at info@3ieimpact.org to discuss opportunities to collaborate.
Thank you for such a clear explanation of AI capabilities for improving global development! You guys rock!