Photo of Zhihong Shao

Artificial intelligence & robotics

Zhihong Shao

He led the DeepSeekMath project as the first author.

Year Honored
2024

Organization
DeepSeek

Region
China

Hails From
China
Zhihong Shao's research focuses on systematically enhancing the reasoning capabilities of large language models. His work aims to build systems that can continuously self-improve and utilize various skills (such as tool use and reasoning) to accomplish increasingly complex tasks. Two of his most representative works are ToRA and DeepSeekMath.

ToRA demonstrated the power of integrating external tool feedback into the reasoning process. The project released ToRA-34B, a powerful tool-augmented large model that incorporates Python execution into chain-of-thought reasoning. It became the first open-source model to score 50% on the competition-level MATH benchmark, highlighting the potential of tool use in enhancing problem-solving capabilities.

To fundamentally improve LLM reasoning, Zhihong co-led the DeepSeekMath project as the first author. This work introduced an iterative process for effectively identifying and scaling high-quality mathematical pre-training data, significantly enhancing the base model's capabilities. The project also pioneered the application of the GRPO-based reinforcement learning algorithm, demonstrating for the first time its effectiveness in significantly enhancing model reasoning abilities. The released DeepSeekMath model has been widely adopted in mathematical reasoning research and supported the top-performing solutions in the inaugural AI Mathematical Olympiad competition (AIMO).

The data curation pipeline developed for DeepSeekMath has since been broadly adopted for pre-training and alignment with high-quality, educational web data. Building on the same reinforcement learning framework, Zhihong also played a key role in the subsequent R1 project, which leveraged greater RL compute to train a powerful reasoning model capable of reflection, backtracking, and verification.