DSpace Repository

Automatic Speech Recognition Using Deep Neural Networks: New Possibilities

Automatic Speech Recognition Using Deep Neural Networks: New Possibilities

Show full item record

Title: Automatic Speech Recognition Using Deep Neural Networks: New Possibilities
Author: Abdel-Hamid, Ossama Abdel-Hamid Mohamed
Abstract: Recently, automatic speech recognition (ASR) systems that use deep neural networks (DNNs) for acoustic modeling have attracted huge research interest. This is due to the recent results that have significantly raised the state of the art performance of ASR systems. This dissertation proposes a number of new methods to improve the state of the art ASR performance by exploiting the power of DNNs.

The first method exploits domain knowledge in designing a special neural network (NN) structure called a convolutional neural network (CNN). This dissertation proposes to use the CNN in a way that applies convolution and pooling operations along frequency to handle frequency variations that commonly happen due to speaker and pronunciation differences in speech signals. Moreover, a new CNN structure called limited weight sharing is proposed to better suit special spectral characteristics of speech signals. Our experimental results have shown that the use of a CNN leads to 6-9% relative reduction in error rate.

The second proposed method deals with speaker variations in a more explicit way through using a new speaker code based adaptation. This method adapts the speech acoustic model to a new speaker by learning a suitable speaker representation based on a small amount of adaptation data from each target speaker. This method alleviates the need to modify any model parameters as is done with other commonly used adaptation methods for neural networks. This greatly reduces the number of parameters to estimate during adaptation; hence, it allows rapid speaker adaptation.

The third proposed method aims to handle the temporal structure within speech segments by using a deep segmental neural network (DSNN). The DSNN model alleviates the need to use an HMM model as it directly models the posterior probability of the label sequence. Moreover, a segment-aware NN structure has been proposed. It is able to model the dependency among speech frames within each segment and performs better than the conventional frame based DNNs. Experimental results show that the proposed DSNN can significantly improve recognition performance as compared with the conventional frame based models.
Subject: Computer science
Keywords: Automatic speech recognition
Neural networks
Speaker adaptation
Convolutional neural networks
Hidden Markov models
Segmental speech recognition
Speaker code.
Speaker representation
Type: Electronic Thesis or Dissertation
Rights: Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
URI: http://hdl.handle.net/10315/29980
Supervisor: Jiang, Hui
Degree: PhD - Doctor of Philosophy
Program: Computer Science
Exam date: 2014-11-07
Publish on: 2015-08-28

Files in this item

This item appears in the following Collection(s)