Generative AI models make mistakes—they can confidently get facts wrong or combine bits of truth into nonsense. These so-called “hallucinations” occur even if AI models train on vast amounts of true information.
For Akari Asai, 30, this is a big problem, especially when the facts matter, such as in scientific research or software development. The solution, she says, is to stop focusing on making bigger and bigger models that simply spit out answers in response to prompts. “We need to do a transformative switch from just scaling monolithic language models to developing augmented language models,” she says, meaning models that can interact with other entities and analyze their own outputs and behavior.
Asai works on retrieval augmented generation (RAG), a technique for language models that makes them consult stored reference materials, called a datastore, before generating a response. Checking a datastore can help the model detect when it’s about to generate a falsehood. Then it can correct its response using the retrieved information.
Self-RAG, a behavioral framework that Asai and co-authors introduced in 2023, takes this approach a step further by having the model work with different parts of the datastore in parallel to decide which are most relevant. Self-RAG doesn’t completely prevent hallucinations, but it tries to limit them while also making sure the machine doesn’t sound like it’s reading from an encyclopedia. From their team’s testing, Self-RAG trained on Meta’s Llama can answer short-form questions 10-25% more accurately than Llama with plain RAG, depending on the type of questions, and the improvement over Llama without any RAG is even more stark.
Asai, who just completed her PhD at the University of Washington and will begin a professorship at Carnegie Mellon University in 2026, is also building custom datastores, which could yield better fact-checking results than general databases like Wikipedia. So far she and her colleagues have built datastores for scientific literature, with 45 million papers, and coding, with 25 million documents. She wants to explore how this approach could work with sensitive biomedical data, too.