Photo of Fei Xia

Artificial intelligence & robotics

Fei Xia

Integrating multiple foundation models with robot perception and actions, paving a new path for robotic intelligence.

Year Honored
2025

Organization
Google DeepMind

Region
Asia Pacific

Fei Xia's research lies at the intersection of computer vision, robotics, and machine learning, addressing the core challenge of enabling general-purpose robots to perform long-horizon tasks in complex, unstructured environments. He combines the powerful reasoning of foundation models like Large Language Models (LLMs) and Vision-Language Models (VLMs) with the physical perception and execution capabilities of robots, pioneering a new technological path for robotic intelligence.

To address the lack of understanding of the physical world in language models, Fei Xia and his team co-led the development of the SayCan ("Do As I Can, Not As I Say") system. This system grounds the planning of language models in the robot's actual physical abilities ("affordances"), enabling it to decompose abstract human commands into a sequence of achievable actions. This framework effectively solves the "armchair" problem of LLMs, empowering robots with real-world commonsense reasoning.

Building on this foundation, Fei Xia and his team continued to iterate between 2023 and 2025, launching a series of even more powerful foundational robotics models. The RT-2 model, which he co-developed, is a vision-language-action (VLA) model that leverages web-scale knowledge to generalize tasks it was not explicitly trained for. As a core member of the PaLM-E project, he helped build an embodied multimodal language model capable of processing inputs from vision and language. Most recently, his team's launch of Gemini Robotics in March 2025, which integrates the Gemini model, is seen as a pivotal step toward general-purpose robots.

Fei Xia's work is paving the way for robots to move beyond the laboratory and into unstructured environments like homes and offices, which could potentially profoundly change how humans interact with machines in the future.