The Perceptual Structure of Sound

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book presents a comprehensive review of how acoustic waves are processed by the auditory system into structured sounds such as musical melodies, speech utterances, or environmental sounds. After an introduction, an overview is given of how the ears distribute acoustic information over a large array of frequency channels that contain the auditory information used by the central nervous system to generate a mental image of what is happening around the listener. This process, called auditory scene analysis, consists of two stages. In the first stage, auditory units are formed such as musical tones and speech syllables. Each auditory unit is perceived at a well-defined moment in time, the beat location of that auditory unit. Moreover, from this process of auditory-unit formation, the auditory attributes of these auditory units emerge, such as their timbre, their pitch, their loudness, and their perceived location. Each of these attributes is discussed in the corresponding chapter.  
In the second stage of auditory scene analysis, auditory-stream formation, the successive auditory units are integrated into auditory streams, i.e., temporally structured sequences of auditory units that are perceived as emanating from one and the same sound source. Examples of such auditory streams are musical melodies and the utterances of one speaker. The temporal structure of an auditory stream, its rhythm, is determined by the beat locations of its auditory units. The role played by the auditory attributes of the consecutive auditory units is discussed. The melodies of musical streams and the intonation contours of spoken utterances emerge from this process. In music, the beats of parallel streams generally fit into a metric pattern, and, depending on harmony, simultaneous tones can be perceived as consonant or dissonant.
Finally, the book contains many sound examples including the MATLAB scripts with which they are generated.

Author(s): Dik J. Hermes
Series: Current Research in Systematic Musicology, 11
Publisher: Springer
Year: 2023

Language: English
Pages: 839
City: Cham

Preface
Speech Perception or Music Perception
Prior Knowledge
Sound Demonstrations and Matlab Scripts
References
Contents
1 Introduction
1.1 Pure Tones
1.1.1 Amplitude
1.1.2 Frequency
1.1.3 Phase
1.2 Complex Tones
1.3 Speech Sounds
1.4 Musical Scales and Musical Intervals
1.5 The Sum of Two Sinusoids
1.5.1 Two Tones: The Range of Perception of a Steady Tone
1.5.2 Two Tones: The Range of Roughness Perception
1.5.3 Two Tones: The Range of Rhythm Perception
1.5.4 Two Tones: The Range of Hearing Slow Modulations
1.5.5 In Summary
1.6 Amplitude Modulation
1.6.1 Amplitude Modulation: The Range of Perception of a Steady Tone
1.6.2 Amplitude Modulation: The Range of Roughness Perception
1.6.3 Amplitude Modulation: The Range of Rhythm Perception
1.6.4 Amplitude Modulation: The Range of Hearing Slow Modulations
1.6.5 In Summary
1.7 Frequency Modulation
1.7.1 Frequency Modulation: The Range of Perception of a Steady Tone
1.7.2 Frequency Modulation: The Range of Roughness Perception
1.7.3 Frequency Modulation: The Range of Rhythm Perception
1.7.4 Frequency Modulation: The Range of Hearing Slow Modulations
1.7.5 In Summary
1.8 Additive Synthesis
1.8.1 The Saw Tooth and the Square Wave
1.8.2 Pulse Trains
1.8.3 Phase of Sums of Equal-Amplitude Sinusoids
1.9 Noise
1.9.1 White Noise
1.9.2 Pink Noise
1.9.3 Brown Noise
1.10 Subtractive Synthesis
1.11 Envelopes
References
2 The Ear
2.1 Overview
2.2 Two Ears
2.3 The Outer Ear
2.4 The Middle Ear
2.5 The Inner Ear
2.5.1 The Basilar Membrane
2.5.2 Distortion Products
2.5.3 The Organ of Corti
2.6 The Auditory Nerve
2.6.1 Refractoriness and Saturation
2.6.2 Spontaneous Activity
2.6.3 Dynamic Range
2.6.4 Band-Pass Filtering
2.6.5 Half-Wave Rectification
2.7 Summary Schema of Peripheral Auditory Processing
2.8 The Central Auditory Nervous System
References
3 The Tonotopic Array
3.1 Masking Patterns and Psychophysical Tuning Curves
3.2 Critical Bandwidth
3.3 Auditory-Filter Bandwidth and the Roex Filter
3.4 The Gammatone Filter
3.5 The Compressive Gammachirp
3.6 Summary Remarks
3.7 Auditory Frequency Scales
3.7.1 The Mel Scale
3.7.2 The Bark Scale
3.7.3 The ERBN-Number Scale or Cam Scale
3.8 The Excitation Pattern
3.8.1 Transfer Through Outer and Middle Ear
3.8.2 Introduction of Internal Noise
3.8.3 Calculation of the Excitation Pattern
3.8.4 From Excitation on a Linear Hertz Scale to Excitation on a Cam Scale
3.8.5 From Excitation to Specific Loudness
3.8.6 Calculation of Loudness
3.9 Temporal Structure
3.9.1 The Autocorrelation Model
3.9.2 dBA-Filtering
3.9.3 Band-Pass Filtering
3.9.4 Neural Transduction
3.9.5 Generation of Action Potentials
3.9.6 Detection of Periodicities: Autocovariance Functions
3.9.7 Peak Detection
3.10 Summary
References
4 Auditory-Unit Formation
4.1 Auditory Scene Analysis
4.2 Auditory Units
4.3 Auditory Streams
4.3.1 Some Examples
4.4 The Perceived Duration of an Auditory Unit
4.5 Perceptual Attributes
4.6 Auditory Localization and Spatial Hearing
4.7 Two Illustrations
4.8 Organizing Principles
4.8.1 Common Fate
4.8.2 Spectral Regularity
4.8.3 Exclusive Allocation
4.9 Consequences of Auditory-Unit Formation
4.9.1 Loss of Identity of Constituent Components
4.9.2 The Emergence of Perceptual Attributes
4.10 Performance of the Auditory-Unit-Formation System
4.11 Concluding Remark
References
5 Beat Detection
5.1 Measuring the Beat Location of an Auditory Unit
5.1.1 Tapping Along
5.1.2 Synchronizing with a Series of Clicks
5.1.3 Absolute Rhythm Adjustment Methods
5.1.4 Relative Rhythm Adjustment Methods
5.1.5 The Phase-Correction Method
5.1.6 Limitations
5.2 Beats in Music
5.3 Beats in Speech
5.3.1 The Structure of Spoken Syllables
5.3.2 Location of the Syllable Beat
5.3.3 The Concepts of P-centre or Syllable Beat
5.4 The Role of Onsets
5.4.1 Neurophysiology
5.5 The Role of F0 Modulation
5.6 Strength and Clarity of Beats
5.7 Interaction with Vision
5.8 Models of Beat Detection
5.9 Fluctuation Strength
5.10 Ambiguity of Beats
References
6 Timbre Perception
6.1 Definition
6.2 Roughness
6.3 Breathiness
6.4 Brightness or Sharpness
6.5 Dimensional Analysis
6.5.1 Timbre Space of Vowel Sounds
6.5.2 Timbre Space of Musical Sounds
6.6 The Role of Onsets and Transients
6.7 Composite Timbre Attributes
6.7.1 Sensory Pleasantness and Annoyance
6.7.2 Voice Quality
6.7.3 Perceived Effort
6.8 Context Effects
6.9 Environmental Sounds
6.10 Concluding Remarks
References
7 Loudness Perception
7.1 Sound Pressure Level (SPL) and Sound Intensity Level (SIL)
7.1.1 Measurement of Sound Pressure Level
7.1.2 ``Loudness'' Normalization
7.2 The dB Scale and Stevens' Power Law
7.3 Stevens' Law of a Pure Tone and a Noise Burst
7.4 Loudness of Pure Tones
7.4.1 Equal-Loudness Contours
7.5 Loudness of Steady Complex Sounds
7.5.1 Limitations of the Loudness Model
7.6 Partial Loudness of Complex Sounds
7.6.1 Some Examples of Partial-Loudness Estimation
7.6.2 Limitations of the Partial-Loudness Model
7.7 Loudness of Time-Varying Sounds
7.7.1 Loudness of Very Short Sounds
7.7.2 Loudness of Longer Time-Varying Sounds
7.8 Concluding Remarks
References
8 Pitch Perception
8.1 Definitions of Pitch
8.2 Pitch Height and Pitch Chroma
8.3 The Range of Pitch Perception
8.4 The Pitch of Some Synthesized Sounds
8.4.1 The Pitch of Pure Tones
8.4.2 The Duration of a Sound and its Pitch
8.4.3 Periodic Sounds and their Pitch
8.4.4 Virtual Pitch
8.4.5 Analytic Versus Synthetic Listening
8.4.6 Some Conclusions
8.5 Pitch of Complex Sounds
8.6 Shepard and Risset Tones
8.7 The Autocorrelation Model
8.8 The Missing Fundamental
8.8.1 Three Adjacent Resolved Harmonics
8.8.2 Three Adjacent Unresolved Harmonics
8.8.3 Three Adjacent Unresolved Harmonics of High Rank
8.8.4 Three Adjacent Shifted Unresolved Harmonics of High Rank
8.8.5 Seven Adjacent Unresolved Harmonics of High Rank
8.8.6 Seven Adjacent Unresolved Harmonics in the Absence of Phase Lock
8.8.7 Conclusions
8.9 Pitch of Non-periodic Sounds
8.9.1 Repetition Noise
8.9.2 Pulse Pairs
8.10 Pitch of Time-Varying Sounds
8.11 Pitch Estimation of Speech Sounds
8.12 Estimation of Multiple Pitches
8.13 Central Processing of Pitch
8.14 Pitch Salience or Pitch Strength
8.15 Pitch Ambiguity
8.16 Independence of Timbre and Pitch
8.17 Pitch Constancy
8.18 Concluding Remarks
References
9 Perceived Location
9.1 Information Used in Auditory Localization
9.1.1 Interaural Time Differences
9.1.2 Interaural Level Differences
9.1.3 Filtering by the Outer Ears
9.1.4 Reverberation
9.2 The Generation of Virtual Sound Sources
9.3 More Information Used in Sound Localization
9.3.1 Movements of the Listener
9.3.2 Rotations of the Sound Source Around the Listener
9.3.3 Movements of the Sound Source Towards and from the Listener
9.3.4 Doppler Effect
9.3.5 Ratio of Low-Frequency and High-Frequency Energy in the Sound Signal
9.3.6 Information About the Room
9.3.7 Information About the Location of Possible Sound Sources
9.3.8 Visual Information
9.3.9 Background Noise
9.3.10 Atmospheric Conditions: Temperature, Humidity, and the Wind
9.3.11 Familiarity with the Sound Source
9.4 Multiple Sources of Information
9.5 Externalization or Internalization
9.6 Measuring Human-Sound-Localization Accuracy
9.7 Auditory Distance Perception
9.7.1 Accuracy of Distance Perception
9.7.2 Direct-to-Reverberant Ratio
9.7.3 Dynamic Information
9.7.4 Perceived Distance, Loudness, and Perceived Effort
9.7.5 Distance Perception in Peripersonal Space
9.8 Auditory Perception of Direction
9.8.1 Accuracy of Azimuth Perception
9.8.2 Accuracy of Elevation Perception
9.8.3 Computational Model
9.9 Auditory Perception of Motion
9.9.1 Accuracy of Rotational-Motion Perception
9.9.2 Accuracy of Radial-Motion Perception
9.9.3 Perception of Looming Sounds
9.9.4 Auditory Motion Detectors?
9.10 Walking Around
9.10.1 Illusory Motion
9.11 Integrating Multiple Sources of Information
9.11.1 Cooperation and Competition?
9.11.2 Exclusive Allocation
9.11.3 Plasticity and Calibration
References
10 Auditory-Stream Formation
10.1 Some Examples
10.1.1 The Trill Threshold
10.1.2 Sequential Integration and Segregation
10.2 Measures of Integration or Segregation
10.3 The Perceived Number of Streams in an Auditory Scene
10.4 The Perceived Number of Units in an Auditory Stream
10.5 The Emergence of Rhythm
10.6 Frequency Scale
10.7 Instability
10.7.1 Build-Up and Resets
10.7.2 Bistability and Multistability
10.7.3 Modelling Multistability
10.7.4 Neurophysiology of Multistability
10.8 Factors Playing a Role in Sequential Integration
10.8.1 Tempo of the Tonal Sequence
10.8.2 Separation in Pitch
10.8.3 Differences in Timbre
10.8.4 Differences in Perceived Location
10.8.5 Familiarity
10.8.6 Attention
10.8.7 Syntax and Semantics
10.8.8 Visual Information
10.8.9 Loudness
10.8.10 Concluding Remarks
10.9 Organizing Principles
10.9.1 Proximity
10.9.2 Similarity
10.9.3 Completion
10.9.4 Connectedness
10.9.5 Good Continuation
10.9.6 Temporal Regularity
10.9.7 Exclusive Allocation
10.9.8 Figure-Ground Organization
10.9.9 Other Elements of Gestalt Psychology
10.10 Establishing Temporal Coherence
10.11 The Continuity Illusion
10.11.1 Four Rules Governing the Continuity Illusion
10.11.2 Information Trading
10.11.3 Onsets and Offsets
10.11.4 Restoration in Speech and in Music
10.11.5 Other Sound Demonstrations on the Continuity Illusion
10.12 Consequences of Auditory-Stream Formation
10.12.1 Camouflage
10.12.2 Order and Temporal Relations
10.12.3 Rhythm
10.12.4 Pitch Contours
10.12.5 Consonance and Dissonance
10.13 Primitive and Schema-Based ASA
10.13.1 The Nature of Schemas
10.13.2 Information Trading
10.14 Human Sound Localization and Auditory Scene Analysis
10.15 Computational Auditory Scene Analysis
10.15.1 Temporal Coherence
10.15.2 Predictive Coding
10.15.3 Neural Networks
References
11 Interpretative Summary
11.1 Introduction
11.2 The Ear
11.3 The Auditory Filter and the Tonotopic Array
11.4 Auditory-Unit Formation
11.5 Beat Detection
11.6 Timbre Perception
11.7 Loudness Perception
11.8 Pitch Perception
11.9 Perceived Location
11.10 Auditory-Stream Formation
11.10.1 Instability
11.10.2 Neurophysiology of Instability
11.10.3 Factors Playing a Role in Sequential Integration
11.10.4 Organizing Principles
11.10.5 Establishing Temporal Coherence
11.10.6 The Continuity Illusion
11.10.7 Consequences of Auditory-Stream Formation
11.10.8 Primitive and Schema-Based ASA
11.10.9 Human Sound Localization and Auditory-Stream Formation
11.10.10 Computational Auditory Scene Analysis
Reference
Index