Emma Beede has an unorthodox claim to technological fame: a study she ran showed that one of her employer’s new technologies needed critical improvements before it could be deployed in the real world.
Beede’s study tested a deep-learning algorithm created by Google Health to screen eye images for diabetic retinopathy, a condition caused by high blood sugar that damages the retina and makes it difficult to sense light. Beede found that the algorithm, which had performed with over 90% accuracy in the lab, presented problems in real-world tests across 11 clinics in Thailand. She found that this was because the algorithm was trained on high-quality eye scans, and when the quality of images taken in the clinic suffered because of factors like poor lighting, the scans were rendered useless. More than 20% of retinal scans were rejected, leaving frustrated patients and their health-care providers looking for more conventional alternatives.
Beede thinks such unsatisfying results are a critical example of the need to ensure that AI-powered tools for humans are put through rigorous and meticulous testing before being deployed. “Humans in the real world are complicated, and we should account for that,” she says. “We need to be doing our due diligence to study those downstream effects so that we can mitigate any risk for harm.”