A few years ago, Bo Li and her colleagues placed small black-and-white stickers on a stop sign in a graffiti-like pattern that looked random to human eyes and did not obscure the sign’s clear lettering. Yet the arrangement was deliberately designed so that if an autonomous vehicle approached, the neural networks powering its vision system would misread the stop sign as one posting a speed limit of 45 mph.
Such “adversarial attacks”—manipulation of input data that looks innocuous to a person but fools neural networks—had been tried before, but earlier exampleshad been mostly digital. For instance, a few pixels might be altered in an image, a change invisible to the naked eye. Li was one of the first to show that such attacks were possible in the physical world. They can be harder for an AI to detect because the methods developed to spot manipulated digital images don’t work on physical objects.
Li also devised subtle changes in the features of physical objects, like shape and texture, that again are imperceptible to humans but can make the objects invisible to image recognition algorithms. Her goal is to use this knowledge about potential attacks to make AI more robust. She pits AI systems against each other, using one neural network to identify and exploit vulnerabilities in another. This process can expose flaws in the training or structure of the target network. Li then develops strategies to patch these flaws and defend against future attacks.
Adversarial attacks can fool other types of neural networks too, not just image recognition algorithms. Imperceptible tweaks to audio can make a voice assistant misinterpret what it hears, for example. Some of Li’s techniques are already being used in commercial applications. IBM uses them to protect its Watson AI, and Amazon to protect Alexa. And a handful of autonomous-vehicle companies apply them to improve the robustness of their machine-learning models.