Language and Speech Processing

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding.This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studies.

Author(s): Joseph Mariani
Edition: 1
Publisher: Wiley-ISTE
Year: 2009

Language: English
Pages: 505
Tags: Информатика и вычислительная техника;Обработка медиа-данных;Обработка звука;Обработка речи;

Spoken Language Processing......Page 5
Table of Contents......Page 7
Preface......Page 15
1.1.1. Source-filter model......Page 19
1.1.2. Speech sounds......Page 20
1.1.3. Sources......Page 24
1.1.4. Vocal tract......Page 30
1.2.1. Source-filter model and linear prediction......Page 36
1.2.2. Autocorrelation method: algorithm......Page 39
1.2.3. Lattice filter......Page 46
1.2.4. Models of the excitation......Page 49
1.3.1. Spectrogram......Page 53
1.3.2. Interpretation in terms of filter bank......Page 54
1.3.3. Block-wise interpretation......Page 55
1.3.4. Modification and reconstruction......Page 56
1.4.1. Bilinear time-frequency representations......Page 57
1.4.2. Wavelets......Page 59
1.4.3. Cepstrum......Page 61
1.4.4. Sinusoidal and harmonic representations......Page 64
1.5. Conclusion......Page 67
1.6. References......Page 68
2.1. Introduction......Page 73
2.1.1. Main characteristics of a speech coder......Page 75
2.1.2. Key components of a speech coder......Page 77
2.2. Telephone-bandwidth speech coders......Page 81
2.2.1. From predictive coding to CELP......Page 83
2.2.2. Improved CELP coders......Page 87
2.2.3. Other coders for telephone speech......Page 95
2.3. Wideband speech coding......Page 97
2.3.1. Transform coding......Page 99
2.3.2. Predictive transform coding......Page 103
2.4.1. A transmission channel for audiovisual speech......Page 104
2.4.2. Joint coding of audio and video parameters......Page 106
2.5. References......Page 111
3.1. Introduction......Page 117
3.2. Key goal: speaking for communicating......Page 118
3.2.1. What acoustic content?......Page 119
3.2.2. What melody?......Page 120
3.2.3. Beyond the strict minimum......Page 121
3.3 Synoptic presentation of the elementary modules in speech synthesis systems......Page 122
3.3.2. Acoustic processing......Page 123
3.3.3. Training models automatically......Page 124
3.4.1. Text pre-processing......Page 125
3.4.2. Grapheme-to-phoneme conversion......Page 126
3.4.3. Syntactic-prosodic analysis......Page 128
3.4.4. Prosodic analysis......Page 130
3.5.1. Rule-based synthesis......Page 132
3.5.2. Unit-based concatenative synthesis......Page 133
3.6. Speech signal modeling......Page 135
3.6.1. The source-filter assumption......Page 136
3.6.3. Formant-based modeling......Page 137
3.6.5. Harmonic plus noise model......Page 138
3.7. Control of prosodic parameters: the PSOLA technique......Page 140
3.7.1. Methodology background......Page 142
3.7.2. The ancestors of the method......Page 143
3.7.3. Descendants of the method......Page 146
3.8. Towards variable-size acoustic units......Page 149
3.8.1. Constitution of the acoustic database......Page 152
3.8.2. Selection of sequences of units......Page 156
3.9. Applications and standardization......Page 160
3.10.1. Introduction......Page 162
3.10.2. Global evaluation......Page 164
3.10.3. Analytical evaluation......Page 169
3.10.4. Summary for speech synthesis evaluation......Page 171
3.12. References......Page 172
4.1. Introduction......Page 187
4.2.3. Human-machine interfaces......Page 188
4.3. Speech as a bimodal process......Page 189
4.3.1. The intelligibility of visible speech......Page 190
4.3.2. Visemes for facial animation......Page 192
4.3.3. Synchronization issues......Page 193
4.3.4. Source consistency......Page 194
4.3.5. Key constraints for the synthesis of visual speech......Page 195
4.4.2. Generating expressions......Page 196
4.5.1. Analysis of the image of a face......Page 198
4.5.4. From the text to the phonetic string......Page 199
4.7. References......Page 200
5.1. Introduction......Page 207
5.2.2. Features for simultaneous fusion......Page 209
5.2.3. Features for sequential fusion......Page 210
5.3.1. Design of a representation......Page 211
5.4. Critique of the CASA approach......Page 218
5.4.1. Limitations of ASA......Page 219
5.4.2. The conceptual limits of “separable representation”......Page 220
5.5.1. Missing feature theory......Page 221
5.5.2. The cancellation principle......Page 222
5.5.4. Auditory scene synthesis: transparency measure......Page 223
5.6. References......Page 224
6.1. Problem definition and approaches to the solution......Page 231
6.2.1. Definition......Page 234
6.2.2. Observation probability and model parameters......Page 235
6.2.3. HMM as probabilistic automata......Page 236
6.2.4. Forward and backward coefficients......Page 237
6.3. Observation probabilities......Page 240
6.4. Composition of speech unit models......Page 241
6.5. The Viterbi algorithm......Page 244
6.6. Language models......Page 246
6.6.1. Perplexity as an evaluation measure for language models......Page 248
6.6.2. Probability estimation in the language model......Page 250
6.6.3. Maximum likelihood estimation......Page 252
6.6.4. Bayesian estimation......Page 253
6.7. Conclusion......Page 254
6.8. References......Page 255
7.1. Introduction......Page 257
7.2. Linguistic model......Page 259
7.3. Lexical representation......Page 262
7.4.1. Feature extraction......Page 265
7.4.2. Acoustic-phonetic models......Page 267
7.4.3. Adaptation techniques......Page 271
7.5. Decoder......Page 274
7.6.1. Efficiency: speed and memory......Page 275
7.6.2. Portability: languages and applications......Page 277
7.6.3. Confidence measures......Page 278
7.7. Systems......Page 279
7.7.1. Text dictation......Page 280
7.7.2. Audio document indexing......Page 281
7.7.3. Dialog systems......Page 283
7.8. Perspectives......Page 286
7.9. References......Page 288
8.1. Introduction......Page 297
8.2. Language characteristics......Page 299
8.3. Language identification by humans......Page 304
8.4. Language identification by machines......Page 305
8.4.2. Performance measures......Page 306
8.4.3. Evaluation......Page 307
8.5. LId resources......Page 308
8.6. LId formulation......Page 313
8.7. Lid modeling......Page 316
8.7.1. Acoustic front-end......Page 317
8.7.2. Acoustic language-specific modeling......Page 318
8.7.3. Parallel phone recognition......Page 320
8.7.4. Phonotactic modeling......Page 322
8.8. Discussion......Page 327
8.9. References......Page 329
9.1.1. Voice variability and characterization......Page 339
9.1.2. Speaker recognition......Page 341
9.2.1. Speaker recognition tasks......Page 342
9.2.2. Operation......Page 343
9.2.3. Text-dependence......Page 344
9.2.4. Types of errors......Page 345
9.2.5. Influencing factors......Page 346
9.3.1. General structure of speaker recognition systems......Page 347
9.3.2. Acoustic analysis......Page 348
9.3.3. Probabilistic modeling......Page 350
9.3.4. Identification and verification scores......Page 353
9.3.5. Score compensation and decision......Page 355
9.3.6. From theory to practice......Page 360
9.4.1. Error rate......Page 361
9.4.2. DET curve and EER......Page 362
9.4.4. Distribution of errors......Page 364
9.4.5. Orders of magnitude......Page 365
9.5.1. Physical access control......Page 366
9.5.2. Securing remote transactions......Page 367
9.5.4. Education and entertainment......Page 368
9.5.5. Forensic applications......Page 369
9.6. Conclusions......Page 370
9.7. Further reading......Page 371
10.1. Introduction......Page 373
10.2.1. Spectral subtraction......Page 375
10.2.2. Adaptive noise cancellation......Page 376
10.2.4. Channel equalization......Page 377
10.3. Robust parameters and distance measures......Page 378
10.3.1. Spectral representations......Page 379
10.3.2. Auditory models......Page 382
10.3.3 Distance measure......Page 383
10.4.1 Model composition......Page 384
10.4.2. Statistical adaptation......Page 385
10.5. Compensation of the Lombard effect......Page 386
10.7. Conclusion......Page 387
10.8. References......Page 388
11.1. Introduction......Page 395
11.2.1. Seeing without hearing......Page 397
11.2.2. Seeing for hearing better in noise......Page 398
11.2.3. Seeing for better hearing… even in the absence of noise......Page 400
11.2.4. Bimodal integration imposes itself to perception......Page 401
11.2.5. Lip reading as taking part to the ontogenesis of speech......Page 403
11.2.6 ..and to its phylogenesis ?......Page 404
11.3. Architectures for audio-visual fusion in speech perception......Page 406
11.3.1.Three paths for sensory interactions in cognitive psychology......Page 407
11.3.2. Three paths for sensor fusion in information processing......Page 408
11.3.3. The four basic architectures for audiovisual fusion......Page 409
11.3.4. Three questions for a taxonomy......Page 410
11.3.5. Control of the fusion process......Page 412
11.4. Audio-visual speech recognition systems......Page 414
11.4.1. Architectural alternatives......Page 415
11.4.2. Taking into account contextual information......Page 419
11.4.3. Pre-processing......Page 421
11.5. Conclusions......Page 423
11.6. References......Page 424
12.1. Introduction......Page 435
12.2. Context......Page 436
12.2.1. The development of micro-electronics......Page 437
12.2.2. The expansion of information and communication technologies and increasing interconnection of computer systems......Page 438
12.2.3. The coordination of research efforts and the improvement of automatic speech processing systems......Page 439
12.3.1. Advantages of speech as a communication mode......Page 442
12.3.2. Limitations of speech as a communication mode......Page 443
12.3.3. Multidimensional analysis of commercial speech recognition products......Page 445
12.4. Application domains with voice-only interaction......Page 448
12.4.1. Inspection, control and data acquisition......Page 449
12.4.3. Office automation: dictation and speech-to-text systems......Page 450
12.4.4. Training......Page 453
12.4.5. Automatic translation......Page 456
12.5. Application domains with multimodal interaction......Page 457
12.5.1. Interactive terminals......Page 458
12.5.2. Computer-aided graphic design......Page 459
12.5.3. On-board applications......Page 460
12.5.4. Human-human communication facilitation......Page 462
12.6. Conclusions......Page 464
12.7. References......Page 465
13.1. Introduction......Page 473
13.3. Speech coding in the telecommunication sector......Page 474
13.4.1. Advantages and limitations of voice command......Page 475
13.4.2. Major trends......Page 477
13.4.4. Call center automation (operator assistance)......Page 478
13.4.5. Personal voice phonebook......Page 480
13.4.7. Other services based on voice command......Page 481
13.6. Text-to-speech synthesis in telecommunication systems......Page 482
13.7. Conclusions......Page 483
13.8. References......Page 484
List of Authors......Page 485
Index......Page 489