Advances in Speech and Music Technology: Computational Aspects and Applications

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book presents advances in speech and music in the domain of audio signal processing. The book begins with introductory chapters on the basics of speech and music, and then proceeds to computational aspects of speech and music, including music information retrieval and spoken language processing. The authors discuss the intersection in the field of computer science, musicology and speech analysis, and how the multifaceted nature of speech and music information processing requires unique algorithms, systems using sophisticated signal processing, and machine learning techniques that better extract useful information. The authors discuss how a deep understanding of both speech and music in terms of perception, emotion, mood, gesture and cognition is essential for successful application. Also discussed is the overwhelming amount of data that has been generated across the world that requires efficient processing for better maintenance, retrieval, indexing and querying and how machine learning and artificial intelligence are most suited for these computational tasks. The book provides both technological knowledge and a comprehensive treatment of essential topics in speech and music processing.

Author(s): Anupam Biswas, Emile Wennekes, Alicja Wieczorkowska, Rabul Hussain Laskar
Series: Signals and Communication Technology
Publisher: Springer
Year: 2023

Language: English
Pages: 445
City: Cham

Preface
Contents
Part I State-of-the-Art
A Comprehensive Review on Speaker Recognition
1 Introduction
2 Basic Overview of a Speaker Recognition System
3 Review on Feature Extraction
3.1 MFCC's Extraction Method
4 Speaker Modeling
4.1 Gaussian Mixture Model-Universal Background Model
4.2 Supervector
4.3 i-vector
4.4 Trends in Speaker Recognition
5 Deep Learning Methods for Speaker Recognition
5.1 Deep Learning for Classification
6 Performance Measure
7 Conclusion
References
Music Composition with Deep Learning: A Review
1 Introduction
1.1 From Algorithmic Composition to Deep Learning
1.2 Neural Network Architectures for Music Composition with Deep Learning
1.2.1 Variational Autoencoders (VAEs)
1.2.2 Generative Adversarial Networks (GANs)
1.2.3 Transformers
1.3 Challenges in Music Composition with Deep Learning
1.4 Chapter Structure
2 The Music Composition Process
3 Melody Generation
3.1 Deep Learning Models for Melody Generation: From Motifs to Melodic Phrases
3.2 Structure Awareness
3.3 Harmony and Melody Conditioning
3.4 Genre Transformation with Style Transfer
4 Instrumentation and Orchestration
4.1 From Polyphonic to Multi-Instrument Music Generation
4.2 Multi-Instrument Generation from Scratch
5 Evaluation and Metrics
5.1 Objective Evaluation
5.2 Subjective Evaluation
6 Discussion
7 Conclusions and Future Work
References
Music Recommendation Systems: Overview and Challenges
1 Introduction
2 Overview of Recommendation Systems
2.1 Content-Based Approach
2.2 Collaborative Approach
2.3 Hybrid Approach
2.4 Context-Aware Approach
2.5 Business Aspects
2.6 Summary
3 User Orientation
3.1 User Profiling
3.2 Psychology and Cognitive Aspects
3.3 Summary
4 Current Challenges and Trends in Music Recommendation
4.1 Music Consumption
4.2 Popularity Bias
4.3 Trends in Music Recommendation
4.4 Summary
5 Conclusion and Future Directions
References
Music Recommender Systems: A Review Centered on Biases
1 Introduction
2 Methodology
3 Music Recommender Systems
3.1 Theoretical Background for MRS
3.2 Related Work on Recommendation Strategies
4 Biases in Music Recommender Systems
4.1 Theoretical Background for Biases
4.2 Related Work on Biases
5 Bias Analysis in Datasets
6 Guidelines for Handling Biases in MRS
7 Conclusions and Future Work
References
Computational Approaches for Indian Classical Music: A Comprehensive Review
1 Introduction
1.1 Indian Classical Music
1.2 Forms of ICM
2 Literature Survey
2.1 Tonic Identification
2.2 Melodic Pattern Processing
2.3 Raag Recognition
3 Datasets
3.1 CompMusic Research Corpora
4 Evaluation Metrics
4.1 Objective Evaluation
4.2 Subjective Evaluation
5 Open Challenges
6 Conclusion
References
Part II Machine Learning
A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique
1 Introduction
2 Background
2.1 Wiener Filtering Technique
2.1.1 Algorithm
2.2 Deep Neural Networks
2.2.1 Algorithm
2.2.2 Fully Connected Network
2.2.3 Convolutional Neural Network (CNN)
3 Results
3.1 Wiener Filtering Technique
3.2 Fully Connected Network
3.2.1 Training Stage
3.2.2 Testing Stage
3.3 Convolutional Neural Network
3.3.1 Training Stage
3.3.2 Testing Stage
4 Conclusion
References
Video Soundtrack Evaluation with Machine Learning: Data Availability, Feature Extraction, and Classification
1 Introduction
2 Data Collection Pipeline
2.1 Collection, Cleanups, and Storage
2.2 Finding Music Within Videos
3 Data Representation and Feature Extraction
3.1 Audio Representations and Feature Extraction
3.2 Symbolic Representations of Music and Feature Extraction
3.3 Video Representation and Feature Extraction
4 Building a Classifier
4.1 Exploring the Data-Set
5 Extensions and Future Work
6 Conclusion
References
Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music
1 Introduction
2 Related Work
3 Preprocessing
3.1 Feature Extraction
3.2 Top Feature Identification
3.3 Feature Extraction
3.4 Top Feature Identification
3.5 Source Separation
4 Methodology
4.1 Convolution Neural Network (CNN)
4.1.1 Model Details
4.1.2 Customization of Models
4.2 Recurrent Neural Networks (RNN)
4.2.1 Model Details
4.2.2 Customization of Models
4.3 Extreme Gradient Boosting (XGboost)
4.4 Combined Model
4.4.1 Model Details
5 Results
5.1 Instrument Identification
5.2 Pitch Detection
5.3 Raga Detection
6 Conclusion
References
Comparison of Convolutional Neural Networks and K-Nearest Neighbors for Music Instrument Recognition
1 Introduction
1.1 Motivation
1.2 Objective
1.3 Organization
2 Literature Review
2.1 Performance Issues
2.2 Problem Statement
3 Proposed Methodology
3.1 Proposed Block Diagram
3.2 CNN-Based Approach
3.3 KNN-Based Approach
4 Experimental Setup and Analysis
4.1 Dataset and Annotations
4.2 Model Training and Testing
5 Results and Discussion
5.1 Performance Evaluation of CNN Model
5.2 Performance Evaluation of KNN Model
6 Conclusion
References
Emotion Recognition in Music Using Deep Neural Networks
1 Introduction
1.1 Related Works
1.1.1 Emotion Recognition, Using the 360-set
1.1.2 Emotion Recognition
1.1.3 Transfer Learning
1.1.4 Data Augmentation (via GANs)
1.2 Chapter Contribution
2 Feature Extraction
2.1 Handcrafted Features
2.2 Mel Spectrograms
3 Music Emotion Recognition Using Deep Learning
3.1 CNN Architectures Used for Music Emotion Recognition
3.2 Transfer Learning
3.2.1 Transfer Learning Scenarios
3.2.2 Choosing a Scenario
3.3 Data Augmentation Using GANs
4 Experiments
4.1 Dataset Origin
4.1.1 Big-Set
4.1.2 360-Set
4.2 Dataset Pre-processing
4.2.1 Splitting the Datasets into Training, Validation, and Test
4.3 Hyperparameter Selection and Settings
4.4 Experiment Summary and Aggregated Results
5 Conclusion
References
Part III Perception, Health and Emotion
Music to Ears in Hearing Impaired: Signal Processing Advancements in Hearing Amplification Devices
1 Introduction
2 Music Perception in Hearing Impaired
3 Music Perception with Hearing Aids
3.1 Music Perception Difficulties in Hearing Aid Users
3.2 Signal Processing Approaches for Music Perception in Hearing Aid
3.2.1 Front-End Processing to Increase Input Dynamic Range
3.2.2 Digital Signal Processing to Improve Power Consumption
3.2.3 Receiver Characteristics to Reduce Distortion
3.3 Parameters of Digital Signal Processing in Hearing Aids for Music Perception
3.3.1 Bandwidth
3.3.2 Compression
3.3.3 Circuit Delays
3.3.4 Frequency Lowering
3.3.5 Feedback Canceller
3.3.6 Directional Microphone
3.3.7 Noise Reduction Algorithms
3.3.8 Environmental Classifier
4 Music Perception by Individuals with Cochlear Implants
5 Conclusions
References
Music Therapy: A Best Way to Solve Anxiety and Depression in Diabetes Mellitus Patients
1 Introduction
2 Relationship Between Depression and Anxiety
3 Existing Work
4 Music Therapy in Diabetic Patients
4.1 Beck Anxiety Method
4.2 Beck Depression Inventory (BDI)
5 Results and Discussion
6 Conclusion
References
Music and Stress During COVID-19 Lockdown: Influence of Locus of Control and Coping Styles on Musical Preferences
1 Introduction
1.1 Coping Styles
1.2 Locus of Control
1.3 Music and Lockdown
2 Method
2.1 Participants
2.2 Measures Used
2.3 Procedure
3 Data Analysis
4 Results
4.1 Descriptive Statistics
4.2 Relationships Among Locus of Control, Coping Styles, Stress, Happiness, and Music
4.3 Preference of Music
4.4 Factor Structure of Music Preferences
4.5 Music Preferred by Coping Styles and Locus of Control
5 Discussion
5.1 Stress, Coping Styles, and Locus of Control
5.2 Music Helped Cope with Stress
5.3 Music Preferences
5.4 Music Preferred by Locus of Control and Coping Styles
6 Limitations
7 Conclusion
References
Biophysics of Brain Plasticity and Its Correlation to Music Learning
1 Introduction
2 Brain Plasticity and Learning
2.1 Synaptic Plasticity
2.2 Intrinsic Plasticity
3 Biophysical Basis of Plasticity
3.1 Synaptic Plasticity
3.2 Mechanism of Intrinsic Plasticity and Its Importance in Learning
4 Intrinsic Plasticity and Music Learning
5 Conclusion
References
Analyzing Emotional Speech and Text: A Special Focus on Bengali Language
1 Introduction
2 Literature Survey
3 Dataset
3.1 Data Recording
3.2 Data Analysis
4 TTS Architecture
4.1 System Framework
4.2 Basic Layout Features
4.3 Highway Network ch15:bib35
4.4 Attention
4.5 Text2Mel
4.6 SSRN
4.7 Griffin-Lim Algorithm
5 Language Independence and English-Bengali TTS Model
5.1 Role of Language in Speech
5.2 Bengali TTS Model
5.2.1 Preprocessing
5.2.2 Postprocessing
5.2.3 Experiments and Results
5.3 English-Bengali TTS Model
5.3.1 Language Independence in TTS
5.3.2 Problems in Multilingual Learning and Ambiguity in Pronunciation
5.3.3 Model Architecture
5.3.4 Experiments and Results
6 Emotion Incorporation in TTS
6.1 Role of Emotions in Speech
6.2 Types of Emotions
6.3 Experiments and Results
7 Conclusion and Future Work
References
Part IV Case Studies
Duplicate Detection for for Digital Audio Archive Management: Two Case Studies
1 Introduction
1.1 Duplicate Detection
2 Case 1: Duplicates in the VRT Shellac Disc Music Archive
2.1 The VRT Shellac Disc Archive
2.2 Determine Unique Material in an Archive
2.3 Detecting Duplicates
2.4 Some Observations
3 Case 2: Meta-Data Reuse for the IPEM Electronic Music Archive
3.1 The IPEM Music Archive
3.2 Merging Two Digitized Archives
3.3 Meta-Data and Segmentation Reuse
3.4 Key Observations: Meta-Data Reuse
4 Duplicate Detection Deep Dive
4.1 Panako: An Acoustic Fingerprinting System
4.2 Panako Evaluation
5 Conclusion
References
How a Song's Section Order Affects Both `Refrein' Perception and the Song's Perceived Meaning
1 Introduction
1.1 Previous Experiments
1.2 Current Experiment
2 Method
2.1 Participants
2.2 Stimuli
2.3 Questionnaire
2.3.1 Analyses
3 Results
3.1 `Refrein' Perception
3.2 Interpretation
3.3 Appreciation
4 Discussion
4.1 `Refrein' Perception
4.2 Interpretation
4.3 Valuation
4.4 Covariates
4.5 Limitations
5 Conclusions
References
Musical Influence on Visual Aesthetics: An Exploration on Intermediality from Psychological, Semiotic, and Fractal Approach
1 Introduction
1.1 Musical and Visual Attributes
1.2 Intermediality Between Music and Visual Arts
1.3 Background of the Present Study
1.4 Roles of Features, Audience Response, and Fractal Analysis in Exploring Intermediality
2 Experimental Details
3 Methodology
3.1 2D-Detrended Fluctuation Analysis
4 Results and Discussions
4.1 Results of Feature Analysis
4.2 Results of Audience Response Analysis for Exploring the Music-Painting Intermediality
4.3 Results of Fractal Analysis
4.4 Statistical Analysis
5 Conclusion
References
Influence of Musical Acoustics on Graphic Design: An Exploration with Indian Classical Music Album Cover Design
1 Introduction
1.1 Synesthesia: Music and Visual Arts
1.2 The Present Study
2 Objective of the Study
3 Experiment Details
3.1 Participants
3.2 Stimuli Used in the Study
3.3 Materials for the Study
3.4 Experimental Process
3.5 Analytical Strategy
4 Methodologies
4.1 Analysis of Musical Acoustics
4.2 DFA Method for Understanding Musical Acoustics and Album Cover Designs
4.3 2D DFA Method Details
5 Results and Discussion
5.1 Semiotic Analysis
5.2 Fractal Analysis of Music and Comparison with Semiotic
6 Conclusions
References
A Fractal Approach to Characterize Emotions in Audio and Visual Domain: A Study on Cross-Modal Interaction
1 Introduction
2 Experimental Details
2.1 Choice of Three Pairs of Audio and Visual Stimuli
3 Methodology
3.1 2D Detrended Fluctuation Analysis
3.2 2D Detrended Cross-Correlation Analysis (DCCA)
4 Results and Discussion
5 Conclusions
References
Inharmonic Frequency Analysis of Tabla Strokes in North Indian Classical Music
1 Introduction
2 Literature Review
2.1 Right Drum Strokes
2.1.1 Stroke Na
2.1.2 Stroke Ta
2.1.3 Stroke Te
2.1.4 Stroke Tun
2.2 Left Drum Strokes
2.2.1 Stroke Ga
2.2.2 Stroke Ka
2.3 Both the Drum Strokes
2.3.1 Stroke Dha
2.3.2 Stroke Dhin
2.3.3 Stroke Tin
3 Tabla Strokes and Mode Progression
4 Experimental Setup
5 Results and Discussions
5.1 Right Drum Strokes
5.1.1 Stroke Na
5.1.2 Stroke Ta
5.1.3 Stroke Te
5.1.4 Stroke Tun
5.2 Left Drum Strokes
5.2.1 Stroke Ga
5.2.2 Stroke Ka
5.3 Both Drum Strokes
5.3.1 Stroke Dha
5.3.2 Stroke Dhin
5.3.3 Stroke Tin
6 Conclusion
References
Index