Photo of Soujanya Poria

Artificial intelligence & robotics

Soujanya Poria

Multimodal AI and natural language processing and implementing sentiment analysis through AI.

Year Honored

Singapore University of Technology and Design

Asia Pacific

Hails From
Asia Pacific

Multimodal AI is a new AI paradigm that combines various data types (images, text, speech, numerical data) with multiple intelligent processing algorithms to achieve higher performance. Humans possess a great deal of commonsense knowledge about the world. Language is inherently multimodal: it includes speech, gestures, facial expressions, head-nods, etc. Ideally, machines should understand all these modalities. Understanding human language also largely depends on the machines’ ability to interpret emotions. Emotional sensitivity can prevent desultory answers provided by the machine, thus making conversations more natural and engaging.

Soujanya Poria’s research primarily focused on multimodal AI. Soujanya has developed several deep learning methods for multimodal emotion analysis by fusing audio, visual, and textual clues for sentiment analysis from multimodal data. For example, he has published a widely used dataset, MELD, a Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. He has proposed innovative deep learning architectures based on Recurrent Neural Networks, and Graph Convolutional Neural networks for modeling speaker states for emotion detection in conversations. Apart from multimodal conversational AI, Soujanya has done several key research works on commonsense AI e.g., he has developed techniques that can infuse commonsense knowledge into deep learning networks for improving their performance on diverse downstream tasks such as domain adaptation and conversation understanding. Later he expanded his research to delve into generative AI where he introduced techniques that can generate empathetic conversations by comprehending the affective states of the participants in the conversation. He also researched in the field of multimodal generative AI such as Tango, a model that generates audio from textual instructions.

In summary, Soujanya has made significant contributions to multimodal conversational AI and commonsense reasoning. The research he is currently undertaking is of high quality and could significantly improve the ability of computers to interpret complex data and provide valuable information automatically to users.