Photo of Xiangyu Zhang

Artificial intelligence & robotics

Xiangyu Zhang

Released China’s first MLLM with a hundred billion parameters.

Year Honored
2024

Organization
StepFun

Region
China

Hails From
China
Xiangyu Zhang is dedicated to researching the design, training, and optimization methods of general neural networks, continuously improving the practicality and intelligence level of models.

He proposed RepVGG, which innovatively applies reparameterization to enable the use of more complex architectures during training for high accuracy, while transforming back to a simpler structure (e.g., VGG) during inference for easy hardware execution. Subsequently, also based on the reparameterization approach and through in-depth analysis of the mechanisms of existing Vision Transformers (ViTs), Xiangyu developed RepLKNet, a super-large convolutional kernel architecture. Unlike ViTs, RepLKNet surpasses mainstream ViTs in performance and boasts a simple structure that facilitates easy deployment.

He is the chief scientist of the large model company StepFun. Unlike many large model companies that choose to start with large language models, StepFun begins with mixed graphical and textual data, directly training native multimodal large models that integrate images and text. He proposed the DreamLLM multimodal large model framework, one of the earliest multimodal large model architectures integrating image-text generation and understanding at the same time.   

Based on this framework, StepFun released China’s first multimodal large model with 100 billion parameters, Step-1V, at the end of 2023, almost simultaneously with Google's first similar model, Gemini 1.0. Its multimodal understanding capability was significantly higher than that of the mainstream visual-language separated architectures at that time. In the following year, they successively released the trillion-parameter MoE base model Step-2, the video generation model Step-Video, the image-text-speech trimodal understanding model Step-1o, and the reasoning model Step R-mini, among others.