Deep Learning Based Speech Quality Prediction

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book presents how to apply recent machine learning (deep learning) methods for the task of speech quality prediction. The author shows how recent advancements in machine learning can be leveraged for the task of speech quality prediction and provides an in-depth analysis of the suitability of different deep learning architectures for this task. The author then shows how the resulting model outperforms traditional speech quality models and provides additional information about the cause of a quality impairment through the prediction of the speech quality dimensions of noisiness, coloration, discontinuity, and loudness.

Author(s): Gabriel Mittag
Series: T-Labs Series in Telecommunication Services
Publisher: Springer
Year: 2022

Language: English
Pages: 170
City: Cham

Preface
Acknowledgments
Contents
Acronyms
1 Introduction
1.1 Motivation
1.2 Thesis Objectives and Research Questions
1.3 Outline
2 Quality Assessment of Transmitted Speech
2.1 Speech Communication Networks
2.2 Speech Quality and Speech Quality Dimensions
2.3 Subjective Assessment
2.4 Subjective Assessment via Crowdsourcing
2.5 Traditional Instrumental Methods
2.5.1 Parametric Models
2.5.2 Double-Ended Signal-Based Models
2.5.3 Single-Ended Signal-Based Models
2.6 Machine Learning Based Instrumental Methods
2.6.1 Non-Deep Learning Machine Learning Approaches
2.6.2 Deep Learning Architectures
2.6.3 Deep Learning Based Speech Quality Models
2.7 Summary
3 Neural Network Architectures for Speech Quality Prediction
3.1 Dataset
3.1.1 Source Files
3.1.2 Simulated Distortions
3.1.3 Live Distortions
3.1.4 Listening Experiment
3.2 Overview of Neural Network Model
3.3 Mel-Spec Segmentation
3.4 Framewise Model
3.4.1 CNN
3.4.2 Feedforward Network
3.5 Time-Dependency Modelling
3.5.1 LSTM
3.5.2 Transformer/Self-Attention
3.6 Time Pooling
3.6.1 Average-/Max-Pooling
3.6.2 Last-Step-Pooling
3.6.3 Attention-Pooling
3.7 Experiments and Results
3.7.1 Training and Evaluation Metric
3.7.2 Framewise Model
3.7.3 Time-Dependency Model
3.7.4 Pooling Model
3.8 Summary
4 Double-Ended Speech Quality Prediction Using Siamese Networks
4.1 Introduction
4.2 Method
4.2.1 Siamese Neural Network
4.2.2 Reference Alignment
4.2.3 Feature Fusion
4.3 Results
4.3.1 LSTM vs Self-Attention
4.3.2 Alignment
4.3.3 Feature Fusion
4.3.4 Double-Ended vs Single-Ended
4.4 Summary
5 Prediction of Speech Quality Dimensions with Multi-TaskLearning
5.1 Introduction
5.2 Multi-Task Models
5.2.1 Fully Connected (MTL-FC)
5.2.2 Fully Connected + Pooling (MTL-POOL)
5.2.3 Fully Connected + Pooling + Time-Dependency(MTL-TD)
5.2.4 Fully Connected + Pooling + Time-Dependency + CNN (MTL-CNN)
5.3 Results
5.3.1 Per-Task Evaluation
5.3.2 All-Tasks Evaluation
5.3.3 Comparing Dimension
5.3.4 Degradation Decomposition
5.4 Summary
6 Bias-Aware Loss for Training from Multiple Datasets
6.1 Method
6.1.1 Learning with Bias-Aware Loss
6.1.2 Anchoring Predictions
6.2 Experiments and Results
6.2.1 Synthetic Data
6.2.2 Minimum Accuracy rth
6.2.3 Training Examples with and Without Anchoring
6.2.4 Configuration Comparisons
6.2.5 Speech Quality Dataset
6.3 Summary
7 NISQA: A Single-Ended Speech Quality Model
7.1 Datasets
7.1.1 POLQA Pool
7.1.2 ITU-T P Suppl. 23
7.1.3 Other Datasets
7.1.4 Live-Talking Test Set
7.2 Model and Training
7.2.1 Model
7.2.2 Bias-Aware Loss
7.2.3 Handling Missing Dimension Ratings
7.2.4 Training
7.3 Results
7.3.1 Evaluation Metrics
7.3.2 Validation Set Results: Overall Quality
7.3.3 Validation Set Results: Quality Dimensions
7.3.4 Test Set Results
7.3.5 Impairment Level vs Quality Prediction
7.4 Summary
8 Conclusions
A Dataset Condition Tables
B Train and Validation Dataset Dimension Histograms
References
Index