Photo of Ranjay Krishna

Artificial intelligence & robotics

Ranjay Krishna

Pioneering the concept of "visual thinking" for models to enhance their three-dimensional spatial perception capabilities.

Year Honored
2025

Organization
University of Washington

Region
Asia Pacific

Ranjay Krishna's research aims to endow machines with the ability to understand visual concepts beyond mere pixel recognition. He integrates theories from cognitive science, linguistics, and social psychology to design a research path that spans structured representations, compositional reasoning, spatial thinking, and social learning.

Addressing the deficiencies of large models in compositional reasoning, he revealed that merely scaling them up is not an effective solution. Inspired by the intergenerational transmission of human culture, he and his team proposed an "iterative learning" training mechanism. By periodically resetting and retraining, this process encourages visual representations to evolve towards more compositional structures. This method, combined with the high-quality PixMo dataset, produced the Molmo series of models, which upon their 2024 release outperformed contemporary models like GPT-4o on several benchmarks.

To address the spatial reasoning deficiencies of multimodal models, Ranjay pioneered the "visual sketchpad" concept. This method enables a model to "think visually" much like a human by drawing auxiliary lines and marking boxes, which helps it decompose complex spatial or mathematical problems and significantly improves its reasoning accuracy. He subsequently deepened this idea into a mechanism for generating latent visual tokens within the model, enhancing its 3D spatial perception.

His research has also expanded into robotics; the recently released MolmoAct model reasons in space before taking actions. Furthermore, he has designed social agents that interact with and actively learn from users, making him one of the early explorers of reinforcement learning from human feedback. His work provides new approaches for building general artificial intelligence systems that are better aligned with human cognitive priors and behaviors.