Show simple item record

dc.contributor.advisorJiang, Hui
dc.creatorAbdel-Hamid, Ossama Abdel-Hamid Mohamed
dc.description.abstractRecently, automatic speech recognition (ASR) systems that use deep neural networks (DNNs) for acoustic modeling have attracted huge research interest. This is due to the recent results that have significantly raised the state of the art performance of ASR systems. This dissertation proposes a number of new methods to improve the state of the art ASR performance by exploiting the power of DNNs. The first method exploits domain knowledge in designing a special neural network (NN) structure called a convolutional neural network (CNN). This dissertation proposes to use the CNN in a way that applies convolution and pooling operations along frequency to handle frequency variations that commonly happen due to speaker and pronunciation differences in speech signals. Moreover, a new CNN structure called limited weight sharing is proposed to better suit special spectral characteristics of speech signals. Our experimental results have shown that the use of a CNN leads to 6-9% relative reduction in error rate. The second proposed method deals with speaker variations in a more explicit way through using a new speaker code based adaptation. This method adapts the speech acoustic model to a new speaker by learning a suitable speaker representation based on a small amount of adaptation data from each target speaker. This method alleviates the need to modify any model parameters as is done with other commonly used adaptation methods for neural networks. This greatly reduces the number of parameters to estimate during adaptation; hence, it allows rapid speaker adaptation. The third proposed method aims to handle the temporal structure within speech segments by using a deep segmental neural network (DSNN). The DSNN model alleviates the need to use an HMM model as it directly models the posterior probability of the label sequence. Moreover, a segment-aware NN structure has been proposed. It is able to model the dependency among speech frames within each segment and performs better than the conventional frame based DNNs. Experimental results show that the proposed DSNN can significantly improve recognition performance as compared with the conventional frame based models.
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.titleAutomatic Speech Recognition Using Deep Neural Networks: New Possibilities
dc.typeElectronic Thesis or Dissertationen_US Science - Doctor of Philosophy
dc.subject.keywordsAutomatic speech recognition
dc.subject.keywordsNeural networks
dc.subject.keywordsSpeaker adaptation
dc.subject.keywordsConvolutional neural networks
dc.subject.keywordsHidden Markov models
dc.subject.keywordsSegmental speech recognition
dc.subject.keywordsSpeaker code.
dc.subject.keywordsSpeaker representation

Files in this item


This item appears in the following Collection(s)

Show simple item record

All items in the YorkSpace institutional repository are protected by copyright, with all rights reserved except where explicitly noted.