Photo of Yi Wu

Artificial intelligence & robotics

Yi Wu

Training large language models for complex strategic reasoning and human-computer collaboration for commercialization.

Year Honored
2025

Organization
Tsinghua University

Region
Asia Pacific

Yi Wu's research focuses on building general artificial intelligence agents that can collaborate with humans through reinforcement learning. His work spans large-scale systems, cutting-edge algorithms, and industrial applications and his algorithmic contributions to multi-agent learning are particularly prominent. The MADDPG algorithm, which he co-developed, and the MAPPO algorithm, which he led the development of, have become foundational frameworks for training complex collaborative and adversarial strategies and are widely adopted in both academia and industry.

He developed a series of large-scale open-source systems to address the efficiency bottlenecks of RL training in the era of large models. In 2024, he led his team to open-source the SRL system, which supports 10,000-core computing, and launched the ReaL system for large language models. Through an efficient distributed training architecture, it improves the training efficiency of reinforcement learning from human feedback, providing a key tool for large model alignment.

He applies these efficient RL systems to train AI agents with complex policy capabilities. In 2024, he supervised the development of a language agent that achieved human-level performance in the social deduction game "Werewolf." This agent combines the language capabilities of large models with RL-based decision optimization to execute advanced deception and cooperation strategies. He has also advanced research in zero-shot human-AI collaboration, enabling AI to effectively cooperate with humans by modeling their preferences.

In 2023, Yi Wu’s team founded OpenPsi Inc. to commercialize RL technology, developing a smart spreadsheet copilot that edits sheet content under users’ language instructions. In 2025, Yi Wu’s team released an open-sourced reinforcement learning framework AReaL, for LLMs and Agents. AReaL was widely recognized in industry and adopted by Ant Group as the training engine for its Ring reasoning models. Now, as principal investigator of the AReaL project, he continues to drive the development and implementation of general-purpose collaborative AI agents, transforming cutting-edge research into practical products that serve millions of users.