Proceedings of the IEEE, 2000. — 63 p.
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities.
This paper is organized as follows. First, psychoacoustic principles are described, with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the modified discrete cosine transform, a perfect reconstruction cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of those techniques that utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms that have become international and/or commercial standards receive in-depth treatment, including the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby1 AC-2/AC-3, and the Sony ATRAC/SDDS algorithms. Then, we describe subjective evaluation methodologies in some detail, including the ITU-R BS.1116 recommendation on subjective measurements of small impairments. This paper concludes with a discussion of future research directions.
IntroductionGeneric Perceptual Audio Coding Architecture
Paper Organization
Psychoacoustic PrinciplesAbsolute Threshold of Hearing
Critical Bands
Simultaneous Masking, Masking Asymmetry, and the Spread of Masking
Nonsimultaneous Masking
Perceptual Entropy
Example Codec Perceptual Model: ISO 11172-3 (MPEG-1) Psychoacoustic Model
Time-Frequency Analysis: Filter Banks and TransformsFilter Banks for Audio Coding: Design Considerations
Cosine Modulated Pseudo —QMF M-Band Banks
Cosine Modulated PR M-Band Banks and the MDCT
Pre-Echo Distortion
Pre-Echo Control Strategies
Transform CodersOptimum Coding in the Frequency Domain (OCF-1, OCF-2, OCF-3)
Perceptual Transform Coder (PXFM)
Brandenburg–Johnston Hybrid Coder
CNET Coder
ASPEC
DPAC
DFT Noise Substitution
DCT with Vector Quantization
MDCT with Vector Quantization
Subband CodersMASCAM
MUSICAM
Wavelet Decompositions
Adapted Wavelet Packet Decompositions
Hybrid Harmonic/Wavelet Decompositions
Signal-Adaptive, Nonuniform Filter Bank (NUFB) Decompositions
IIR Filter Banks
Sinusoidal CodersAnalysis/Synthesis Audio Codec
Harmonic and Individual Lines Plus Noise Coder
FM Synthesis
Hybrid Sinusoidal Coders
Linear-Prediction-Based CodersMultipulse Excitation
Discrete Wavelet Excitation Coding
Sinusoidal Excitation Coding
Frequency Warped LP
Audio Coding StandardsISO/IEC 11172-3 (MPEG-1) and ISO/IEC IS13818-3 (MPEG-2 BC)
ISO/IEC IS13818-7 (MPEG-2 NBC/AAC)
ISO/IEC 14 496-3 (MPEG-4)
Precision Adaptive Subband Coding
Adaptive Transform Acoustic Coding
Sony Dynamic Digital Sound (SDDS)
Lucent Technologies Perceptual Audio Coder (PAC), Enhanced PAC (EPAC), and Multichannel PAC (MPAC)
DOLBY AC-2, AC-2A
Quality Measures for Perceptual Audio CodingSubjective Quality Measures
Confounding Factors in Subjective Evaluations
Subjective Evaluations of Two-Channel Standardized Codecs
Subjective Evaluations of 5.1-Channel Standardized Codecs
ConclusionSummary of Applications for Commercial and International Standards
Summary of Recent Research and Future Research Directions