The Use of Recurrent Neural Networks in Continuous Speech Recognition

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Издательство Cambridge University Engineering Department, 1995. -46 pp.
Most - if not all - automatic speech recognition systems explicitly or implicitly compute a score (equivalently, distance, probability, etc.) indicating how well a novel utterance matches a model of the hypothesised utterance. A fundamental problem in speech recognition is how this score may be computed, given that speech is a non-stationary stochastic process. In the interest of reducing the computational complexity, the standard approach used in the most prevalent systems (e.g., dynamic time warping (DTW) and hidden Markov models (HMMs)) factors the hypothesis score into a local acoustic score and a local transition score. In the HMM framework, the observation term models the local (in time) acoustic signal as a stationary process, while the transition probabilities are used to account for the time-varying nature of speech.
This chapter presents an extension to the standard HMM framework which addresses the issue of the observation probability computation. Specifically, an artificial recurrent neural network (RNN) is used to compute the observation probabilities within the HMM framework. This provides two enhancements to standard HMMs; (1) the observation model is no longer local, and (2) the RNN architecture provides a nonparametric model of the acoustic signal. The result is a speech recognition system able to model long-term acoustic context without strong assumptions on the distribution of the observations. One such system has been successfully applied to a 20,000 word, speaker-independent, continuous speech recognition task and is described in this chapter.
Introduction
The Hybrid RNN/HMM Approach
The HMM Framework
Context Modelling
Recurrent Networks for Phone Probability Estimation
System Description
The Acoustic Vector Level
The Phone Probability Level
Posterior Probabilities to Scaled Likelihoods
Decoding Scaled Likelihoods
System Training
Training the RNN
RNN Objective Function
Gradient Computation
Weight Update
Special Features
Connectionist Model Combination
Duration Modelling
Efficient Models
Decoding
Search Algorithm
Pruning
Summary of Variations
Training Criterion
Distribution Assumptions
Practical Issues
A Large Vocabulary System

Author(s): Robinson T., Hochberg M., Renals S.

Language: English
Commentary: 559538
Tags: Информатика и вычислительная техника;Искусственный интеллект;Распознавание образов