Getting computers to see—to actually see—has been an ambition of countless computer scientists for decades. Few have come closer than Andrej Karpathy, whose approach to deep neural networks allows machines to make sense of what is happening in images.
As a graduate student at Stanford, Karpathy extended techniques for building what are known as convolutional neural networks (CNNs)—systems that broadly mimic the neuron structure in the visual cortex. (In 2015 he also designed and was the primary instructor for the first deep-learning class at Stanford.)
By combining CNNs with other deep-learning approaches, he created a system that was not just better at recognizing individual items in images (say, a dog or a person), but capable of seeing an entire scene full of objects—multiple dogs and people interacting with each other—and effectively building a story of what was happening in it and what might happen next.
In 2017, Karpathy joined Tesla, where he oversees neural networks for the cars’ Autopilot feature. That includes collision detection, self-driving capabilities, and summoning (having a car drive autonomously from where it is parked).
Using Karpathy’s advances, Tesla is taking a different path from most other automakers. Typically, self-driving vehicles scan their surroundings with expensive laser range finders, build a virtual map, and then use AI to make decisions about what to do. Tesla’s approach uses traditional cameras. Not only can Karpathy’s method let the car spot objects in the road as a human driver would, but it can take in the entire scene (cars, people, intersections, stop signs, and more) and—if it works as intended—instantly infer what’s taking place. Doing so requires nearly 50 neural networks to constantly process data coming in as the more than a million cars in the fleet look and learn.