Speech Emotion Recognition in Conversations Using Graph Convolutional Networks

Date

2024-03-16

Authors

Chandola, Deeksha

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Speech emotion recognition (SER) is the task of automatically recognizing emotions expressed in spoken language. Current approaches focus on analyzing isolated speech segments to identify a speaker’s emotional state. That being said, models based on text-based emotion recognition methods are considering conversational context and are moving towards emotion recognition in conversation (ERC). With the availability of multimodal datasets, ERC can be extended to non-text modalities as well. Building on these advances, in this thesis, we propose SERC-GCN, a method for speech emotion recognition in conversation (SERC) that predicts a speaker’s emotional state by incorporating conversational context, specifically speaker interactions, and temporal dependencies between utterances. SERC-GCN is a two-stage method. In the first stage, emotional features of utterance-level speech signals are extracted using a graph-based neural network. Here each individual speech utterance is transformed into a cyclic graph. These graphs are then processed by a two layered GCN architecture followed by a pooling layer to extract utterance-specific emotional features. In the second stage, these features are used to form conversation graphs that are used to train a graph convolutional network to perform SERC. We empirically evaluate the effectiveness of SERC-GCN on two benchmark dataset; IEMOCAP and MELD. Results show that SERC-GCN outperforms existing baseline approaches on these datasets.

Description

Keywords

Computer science, Computer engineering, Psychology

Citation