Simon Shaolei Du focuses on the theoretical foundations of modern artificial intelligence, establishing a systematic research path around deep learning's trainability and generalization, as well as sample complexity in reinforcement learning and representation learning. His research provides theoretical explanations for two of deep learning's core mysteries: he provided the first rigorous proof that stochastic gradient descent (SGD) finds global minima in deep network training, and he established a foundational connection between deep learning and kernel methods, which explains the strong generalization properties of large-scale models. His recent work further explores the complex effects of overparameterization on optimization, providing new insights for model design.
Facing the core challenge of reinforcement learning's high data requirements, in 2024 he and his collaborators resolved a fundamental open problem on sample complexity that had remained unsolved for nearly three decades. They prove their new algorithm achieves the optimal data efficiency, and its complexity can be independent of planning horizon, opening new paths for solving complex real-world problems. For foundation models, he provided the first theoretical results identifying that a good representation and data diversity are necessary and sufficient for pre-training to be effective. Based on this, in the last two years, his team developed active learning and efficient reinforcement fine-tuning frameworks to significantly improve data and label efficiency for large models.
His goal is to continue building a solid theoretical foundation for artificial intelligence, improving the efficiency and reliability of machine learning systems, and promoting the application of cutting-edge AI technologies in key domains.