Sewon Min has long focused on how to enable models to find answers from the external world. In recent years, she has advanced retrieval-augmented and nonparametric language models on multiple fronts. At the engineering level, she co-authored work that scaled a retrieval datastore to a trillion tokens, demonstrating the potential of external knowledge for long-term updates and acquiring long-tail facts. This provides a path for models to shift from "parametric memory" to "data pluggability." At the system interface level, her REPLUG model enables black-box LMs to efficiently access retrieved evidence, improving the verifiability of text generation. To address data and legal risks, she co-developed the nonparametric SILO framework, which isolates risks by placing sensitive data in a replaceable datastore.
To quantify the "reliability" of generated text, Min co-developed FActScore, a fine-grained metric for factual precision that has been widely adopted to evaluate and improve the factuality of long-form generation. At the learning paradigm level, focusing on how to better use data, her work includes In-Context Pretraining and BTR to improve the efficiency of cross-document modeling and retrieval. Concurrently, she has contributed to scalable language modeling research like Infini-gram, strengthening the foundational infrastructure for nonparametric and retrieval-based models.
Sewon Min's goal is to continue deepening her research on nonparametric models, driving the development of language models toward greater openness, controllability, and trustworthiness. She aims to build next-generation AI systems that feature superior performance, flexible design, and robust legal compliance.