Photo of Yuma Koizumi

Artificial intelligence & robotics

Yuma Koizumi

What is the source, duration, location, and nature of sounds that we are hearing? A researcher aims to create the ultimate sound recognition AI.

Year Honored


We are surrounded by many different sounds in our daily life, from which we obtain a large amount of information. For instance, the sound of people and cars approaching us from behind when we are walking along a road allows us to become aware of them and alerts us to their presence. The sound of waves, too, draws our attention to the presence of the sea nearby. Likewise, a skilled factory worker can be alerted to the fact that a piece of equipment is malfunctioning from the abnormal noises it is making. However, artificial intelligence (AI) is still incapable of doing this.

Yuma Koizumi is conducting research on sound environment recognition technology that allows machines to recognize all kinds of sounds. What is the source, duration, location, and nature of sounds that we are hearing? Koizumi aims to develop the ultimate sound recognition AI that can identify these characteristics.

He has already achieved some tangible results in his research. For one, he has created a technology to suppress noise in noisy environments so that humans can hear more easily. Existing methods involve a trade-off between noise reduction and degradation in sound quality, which makes it difficult to hear the sounds we would like to hear when noise is suppressed. In response to this problem, Koizumi has adopted reinforcement learning, a machine learning method, to successfully reduce noise without compromising sound quality.

Another result is the development of an abnormal sound detection technology capable of detecting sounds that suggest that a piece of industrial equipment is possibly malfunctioning. While such technology is essential for automation in the manufacturing industry, the extremely small number of abnormal sound samples available for training machines made accurate detection a challenge. Koizumi has devised a new learning algorithm that minimizes the frequency of false alarms. He was able to achieve an accurate detection rate even with a small number of sound samples. He has also played a role in stimulating research on this technology by building a public dataset of abnormal sounds, developing performance evaluation metrics, and organizing international competitions.

The emphasis in the world of AI over the past decade has been on the progress made by the "eyes" of AI, epitomized by its image recognition capabilities. However, developing the "ears" of AI will create new possibilities in areas such as robotics, automated driving, and remote communication. There are high expectations for what Koizumi's research can achieve in terms of visualizing sound beyond what humans are capable of.