An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

An Introduction to Audio Content Analysis

Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches

An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation.

To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website.

Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include:

  • Digital audio signals and their representation, common time-frequency transforms, audio features
  • Pitch and fundamental frequency detection, key and chord
  • Representation of dynamics in music and intensity-related features
  • Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment
  • Audio fingerprinting, musical genre, mood, and instrument classification

An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.

Author(s): Alexander Lerch
Edition: 2
Publisher: Wiley-IEEE Press
Year: 2022

Language: English
Commentary: Publisher PDF
Pages: 464
City: Piscataway, NJ
Tags: Audio Content Analysis; Musical Audio Signals; Algorithmic Analysis; AI-driven Algorithms; Machine Learning; Tonal; Temporal; Timbral; Digital Audio Signals; Pitch Detection; Frequency Detection; Key; Chord; Beat Histograms; Tempo Detection; Audio Fingerprinting; Musical Genre; Mood; Instrument Classification

Cover
Title Page
Copyright
Contents
Author Biography
Preface
Acronyms
List of Symbols
Source Code Repositories
Chapter 1 Introduction
1.1 A Short History of Audio Content Analysis
1.2 Applications and Use Cases
1.2.1 Music Browsing and Music Discovery
1.2.2 Music Consumption
1.2.3 Music Production
1.2.4 Music Education
1.2.5 Generative Music
References
Part I Fundamentals of Audio Content Analysis
Chapter 2 Analysis of Audio Signals
2.1 Audio Content
2.2 Audio Content Analysis Process
2.3 Exercises
2.3.1 Questions
References
Chapter 3 Input Representation
3.1 Audio Signals
3.1.1 Periodic Signals
3.1.2 Random Signals
3.1.3 Statistical Signal Description
3.1.3.1 Arithmetic Mean
3.1.3.2 Geometric Mean
3.1.3.3 Harmonic Mean
3.1.3.4 Variance and Standard Deviation
3.1.3.5 Quantiles and Quantile Ranges
3.1.4 Digital Audio Signals
3.2 Audio Preprocessing
3.2.1 Down‐Mixing
3.2.2 DC Removal
3.2.3 Normalization
3.2.4 Sample Rate Conversion
3.2.5 Block‐Based Processing
3.2.6 Other Preprocessing Options
3.3 Time‐Frequency Representations
3.3.1 Fourier Transform
3.3.2 Constant Q Transform
3.3.3 Log‐Mel Spectrogram
3.3.4 Filterbanks
3.4 Other Input Representations
3.5 Instantaneous Features
3.5.1 Spectral Centroid
3.5.2 Spectral Spread
3.5.3 Spectral Skewness and Spectral Kurtosis
3.5.4 Spectral Rolloff
3.5.5 Spectral Decrease
3.5.6 Spectral Slope
3.5.7 Mel Frequency Cepstral Coefficients
3.5.8 Spectral Flux
3.5.9 Spectral Crest Factor
3.5.10 Spectral Flatness
3.5.11 Tonal Power Ratio
3.5.12 Maximum of Autocorrelation Function
3.5.13 Zero Crossing Rate
3.6 Learned Features
3.7 Feature PostProcessing
3.7.1 Derived Features
3.7.2 Feature Aggregation
3.7.3 Normalization and Mapping
3.7.4 Feature Dimensionality Reduction
3.7.4.1 Feature Subset Selection
3.7.4.2 Feature Space Transformation
3.8 Exercises
3.8.1 Questions
3.8.2 Assignments
References
Chapter 4 Inference
4.1 Classification
4.2 Regression
4.3 Clustering
4.4 Distance and Similarity
4.5 Underfitting and Overfitting
4.6 Exercises
4.6.1 Questions
4.6.2 Assignments
References
Chapter 5 Data
5.1 Data Split
5.1.1 N‐Fold Cross Validation
5.2 Training Data Augmentation
5.3 Utilization of Data From Related Tasks
5.4 Reducing Accuracy Requirements for Data Annotation
5.5 Semi‐, Self‐, and Unsupervised Learning
5.6 Exercises
5.6.1 Questions
5.6.2 Assignments
References
Chapter 6 Evaluation
6.1 Metrics
6.1.1 Classification
6.1.2 Regression
6.1.3 Clustering
6.2 Exercises
6.2.1 Questions
References
Part II Music Transcription
Chapter 7 Tonal Analysis
7.1 Human Perception of Pitch
7.1.1 Pitch Scales
7.1.2 Chroma Perception
7.2 Representation of Pitch in Music
7.2.1 Pitch Classes and Names
7.2.2 Intervals
7.2.3 The Frequency of Musical Pitch
7.2.3.1 Temperament
7.2.3.2 Intonation
7.3 Fundamental Frequency Detection
7.3.1 Detection Accuracy
7.3.1.1 Time Domain
7.3.1.2 Frequency Domain
7.3.1.3 Potential Solutions
7.3.2 Preprocessing
7.3.3 Monophonic Input Signals
7.3.3.1 Zero Crossing Rate
7.3.3.2 Autocorrelation Function
7.3.3.3 Average Magnitude Difference Function
7.3.3.4 Harmonic Product Spectrum and Harmonic Sum Spectrum
7.3.3.5 Autocorrelation Function of the Magnitude Spectrum
7.3.3.6 Cepstral Pitch Detection
7.3.3.7 Maximum Likelihood and Template Matching
7.3.3.8 Auditory‐Motivated Pitch Tracking
7.3.4 Polyphonic Input Signals
7.3.4.1 Iterative Subtraction
7.3.4.2 Nonnegative Matrix Factorization
7.3.4.3 Other Approaches
7.3.5 Evaluation
7.3.5.1 Metrics
7.3.5.2 Datasets
7.3.5.3 Results
7.4 Tuning Frequency Estimation
7.4.1 Approaches to Tuning Frequency Estimation
7.4.2 Evaluation
7.5 Key Detection
7.5.1 Pitch Chroma
7.5.1.1 Pitch Chroma Properties
7.5.1.2 Features Derived from the Pitch Chroma
7.5.2 Approaches to Key Detection
7.5.2.1 Key Profiles
7.5.2.2 Similarity Measure between Template and Extracted Vector
7.5.3 Evaluation
7.5.3.1 Metrics
7.5.3.2 Datasets
7.5.3.3 Results
7.6 Chord Recognition
7.6.1 Approaches to Chord Recognition
7.6.2 Viterbi Algorithm
7.6.3 Evaluation
7.6.3.1 Metrics
7.6.3.2 Datasets
7.6.3.3 Results
7.7 Exercises
7.7.1 Questions
7.7.2 Assignments
References
Chapter 8 Intensity
8.1 Human Perception of Intensity and Loudness
8.2 Representation of Dynamics in Music
8.3 Features
8.3.1 Root Mean Square
8.3.2 Weighted Root Mean Square
8.3.3 Peak Envelope
8.3.4 Psycho‐Acoustic Loudness Features
8.4 Exercises
8.4.1 Questions
8.4.2 Assignments
References
Chapter 9 Temporal Analysis
9.1 Human Perception of Temporal Events
9.1.1 Onsets
9.1.2 Tempo and Meter
9.1.3 Rhythm
9.1.4 Timing
9.2 Representation of Temporal Events in Music
9.2.1 Tempo and Time Signature
9.2.2 Note Value
9.3 Onset Detection
9.3.1 Novelty Function
9.3.2 Peak Picking
9.3.3 Evaluation
9.3.3.1 Metrics
9.3.3.2 Datasets
9.3.3.3 Results
9.4 Beat Histogram
9.4.1 Beat Histogram Features
9.5 Detection of Tempo and Beat Phase
9.5.1 Evaluation
9.5.1.1 Metrics
9.5.1.2 Datasets
9.5.1.3 Results
9.6 Detection of Meter and Downbeat
9.7 Structure Detection
9.7.1 Self‐Similarity Matrix
9.7.2 Approaches to Structure Detection
9.7.2.1 Novelty Analysis
9.7.2.2 Homogeneity Analysis
9.7.2.3 Repetition Analysis
9.7.3 Evaluation
9.7.3.1 Metrics
9.7.3.2 Datasets
9.7.3.3 Results
9.8 Automatic Drum Transcription
9.8.1 Transcription of Drum Onsets
9.8.2 Evaluation
9.9 Exercises
9.9.1 Questions
9.9.2 Assignments
References
Chapter 10 Alignment
10.1 Dynamic Time Warping
10.1.1 Example
10.1.2 Common Variants
10.1.3 Optimizations
10.2 Audio‐to‐Audio Alignment
10.3 Audio‐to‐Score Alignment
10.3.1 Real‐Time Systems
10.3.2 Non‐Real‐Time Systems
10.4 Evaluation
10.4.1 Metrics
10.4.2 Data
10.5 Exercises
10.5.1 Questions
10.5.2 Assignments
References
Part III Music Identification, Classification, and Assessment
Chapter 11 Audio Fingerprinting
11.1 Fingerprint Extraction
11.2 Fingerprint Matching
11.3 Fingerprinting System: Example
11.4 Evaluation
References
Chapter 12 Music Similarity Detection and Music Genre Classification
12.1 Music Similarity Detection
12.1.1 Approaches to Music Similarity Computation
12.1.2 Evaluation
12.2 Musical Genre Classification
12.2.1 Approaches to Musical Genre Classification
12.2.2 Genre Classification: Example
12.2.3 Evaluation
12.2.3.1 Metrics
12.2.3.2 Data
12.2.3.3 Results
12.2.4 Exercises
12.2.5 Questions
12.2.6 Assignments
References
Chapter 13 Mood Recognition
13.1 Approaches to Mood Recognition
13.2 Evaluation
References
Chapter 14 Musical Instrument Recognition
14.1 Evaluation
References
Chapter 15 Music Performance Assessment
15.1 Music Performance
15.2 Music Performance Analysis
15.3 Approaches to Music Performance Assessment
References
Part IV Appendices
Appendix A Fundamentals
A.1 Sampling and Quantization
A.1.1 Sampling
A.1.2 Quantization
A.2 Convolution
A.2.1 Identity
A.2.2 Commutativity
A.2.3 Associativity
A.2.4 Distributivity
A.2.5 Circularity
A.2.6 Simple Filter Examples
A.2.6.1 Moving Average Filter
A.2.6.2 Single‐Pole Low‐Pass Filter
A.2.7 Zero‐Phase Filtering with IIRs
A.3 Correlation Function
A.3.1 Normalization
A.3.2 Autocorrelation Function
A.3.3 Applications
A.3.4 Calculation in the Frequency Domain
A.3.4.1 Frequency Domain Compression
References
Appendix B Fourier Transform
B.1 Properties of the Fourier Transformation
B.1.1 Inverse Fourier Transform
B.1.2 Superposition
B.1.3 Convolution and Multiplication
B.1.4 Parseval's Theorem
B.1.5 Time and Frequency Shift
B.1.6 Symmetry
B.1.7 Time and Frequency Scaling
B.1.8 Derivatives
B.2 Spectrum of Example Time Domain Signals
B.2.1 Delta Function
B.2.2 Constant
B.2.3 Cosine
B.2.4 Rectangular Window
B.2.5 Delta Pulse
B.3 Transformation of Sampled Time Signals
B.4 Short Time Fourier Transform of Continuous Signals
B.4.1 Window Functions
B.4.1.1 Rectangular Window
B.4.1.2 Bartlett Window
B.4.1.3 Generalized Superposed Cosines
B.4.1.4 Generalized Power of Cosine
B.5 Discrete Fourier Transform
B.5.1 Window Functions
B.5.1.1 Discrete Window Properties
B.5.2 Fast Fourier Transform
B.6 Frequency Reassignment: Instantaneous Frequency
References
Appendix C Principal Component Analysis
C.1 Computation of the Transformation Matrix
C.2 Interpretation of the Transformation Matrix
Appendix D Linear Regression
Appendix E Software for Audio Analysis
E.1 Frameworks and Libraries
E.1.1 librosa
E.1.2 Essentia
E.1.3 openSMILE
E.1.4 Marsyas
E.1.5 jMIR
E.1.6 MIRtoolbox
E.1.7 Yaafe
E.1.8 madmom
E.1.9 Software for Education
E.1.10 Other Software
E.2 Data Annotation and Visualization
References
Appendix F Datasets
References
Index
EULA