Reviews the new High Efficiency Video Coding (HEVC) standard and advancements in adaptive streaming technologies for use in broadband networks and the Internet This book describes next-generation video coding and streaming technologies with a comparative assessment of the strengths and weaknesses. Specific emphasis is placed on the H.265/HEVC video coding standard and adaptive bit rate video streaming. In addition to evaluating the impact of different types of video content and powerful feature sets on HEVC coding efficiency, the text provides an in-depth study on the practical performance of popular adaptive streaming platforms and useful tips for streaming optimization. Readers will learn of new over-the-top (OTT) online TV advancements, the direction of the broadband telecommunications industry, and the latest developments that will help keep implementation costs down and maximize return on infrastructure investment. * Reviews the emerging High Efficiency Video Coding (HEVC) standard and compares its coding performance with the MPEG-4 Advanced Video Coding (AVC) and MPEG-2 standards * Provides invaluable insights into the intra and inter coding efficiencies of HEVC, such as the impact of hierarchical block partitioning and new prediction modes * Evaluates the performance of the Apple and Microsoft adaptive streaming platforms and presents innovative techniques related to aggregate stream bandwidth prediction, duplicate chunk * Includes end-of-chapter homework problems and access to instructor slides Next-Generation Video Coding and Streaming is written for students, researchers, and industry professionals working in the field of video communications. Benny Bing has worked in academia for over 20 years. He has published over 80 research papers and 12 books, and has 6 video patents licensed to industry. He has served as a technical editor for several IEEE journals and an IEEE Communications Society Distinguished lecturer. He also received the National Association of Broadcasters (NAB) Technology Innovation Award for demonstrations of advanced media technologies
Author(s): Bing, Benny
Edition: 1
Publisher: John Wiley and Sons
Year: 2015
Language: English
Pages: 344
Tags: Информатика и вычислительная техника;Обработка медиа-данных;Обработка видео;
Content: Preface xvii 1 Digital Video Delivery 1 1.1 Broadband TV Landscape 2 1.1.1 Internet TV Providers 2 1.1.2 Netflix 3 1.1.3 Hulu 3 1.1.4 Amazon 3 1.1.5 YouTube 3 1.1.6 ESPN3 4 1.1.7 HBO 4 1.1.8 CBS 4 1.1.9 Sony 4 1.1.10 Retail Giants 4 1.2 Internet TV Delivery Platforms 5 1.2.1 Cloud TV 5 1.2.2 Content Delivery Network 6 1.2.3 Free CDN 6 1.2.4 Video Transcoding 7 1.3 Second Screen Device Adoption 7 1.3.1 Mobile Video 8 1.3.2 Mobile Versus Traditional TV 8 1.3.3 Over-the-Air Digital TV 8 1.3.4 Non-Real-Time TV Delivery 9 1.3.5 NRT Use Cases 9 1.3.6 Cable Wi-Fi Alliance 9 1.4 Screen and Video Resolution 10 1.4.1 Aspect Ratios 11 1.4.2 Video Resolution 11 1.4.3 Visual Quality 13 1.4.4 Matching Video Content to Screen Size 13 1.5 Stereoscopic 3D TV 14 1.5.1 Autostereoscopic 3D 14 1.5.2 Anaglyph 3D 14 1.6 Video Coding Standards 15 1.6.1 Exploiting Video Content Redundancies 15 1.6.2 High-Quality Versus High-Resolution Videos 16 1.6.3 Factors Affecting Coded Video Bit Rates 16 1.6.4 Factors Affecting Coded Frame Sizes 17 1.7 Video Streaming Protocols 18 1.7.1 Video Streaming over HTTP 19 1.7.2 Adaptive Bit Rate Streaming 19 1.7.3 Benefits and Drawbacks of Adaptive Streaming 20 1.7.4 HTTP Progressive Download 20 1.7.5 HTML5 20 1.8 TV Interfaces and Navigation 21 1.8.1 Streaming Adapters 21 1.8.2 Streaming Boxes 21 1.8.3 Media-Activated TV Navigation 22 1.8.4 Smartphone and Tablet TV Navigation 22 1.8.5 Digital Living Network Alliance 22 1.8.6 Discovery and Launch 23 1.8.7 UltraViolet 23 References 24 Homework Problems 24 2 Video Coding Fundamentals 29 2.1 Sampling Formats of Raw Videos 29 2.1.1 Color Subsampling 30 2.1.2 YUV Versus RGB Color Space 31 2.1.3 Bit Rate and Storage Requirements 31 2.2 Impact of Video Compression 32 2.2.1 Rate-Distortion Optimization 32 2.2.2 Partitions in a Video Frame 33 2.2.3 Video Coding Standards 34 2.2.4 Profiles and Levels 34 2.3 General Video Codec Operations 34 2.3.1 Transform Coding 35 2.3.2 Quantization 35 2.3.3 Deblocking Filter 37 2.4 Transform Coding 38 2.4.1 Orthonormal Transforms 38 2.4.2 Discrete Cosine Transform 40 2.4.3 Discrete Sine Transform 44 2.4.4 Asymmetric DST 44 2.4.5 Comparison of KLT, ADST, and DCT 44 2.4.6 Hybrid Transforms 46 2.4.7 Wavelet Transform 46 2.4.8 Impact of Transform Size 46 2.4.9 Impact of Parallel Coding 47 2.5 Entropy Coding 47 2.5.1 Variable Length Codes 47 2.5.2 Golomb Codes 48 2.5.3 Arithmetic Coding Overview 48 2.5.4 Nonadaptive Arithmetic Coding 49 2.5.5 Steps in Nonadaptive Arithmetic Coding 49 2.5.6 Context-Based Adaptive Arithmetic Coding 50 2.5.7 Code Synchronization 50 2.6 MPEG (H.26x) Standards 51 2.6.1 MPEG Frames 51 2.6.2 I Frames 51 2.6.3 P Frames 52 2.6.4 B Frames 52 2.6.5 Intracoded P and B Frames 52 2.7 Group of Pictures 53 2.7.1 GOP Length 53 2.7.2 Closed GOP 53 2.7.3 Error Resiliency in a Closed GOP 54 2.7.4 Decoding Sequence 55 2.7.5 Open GOP 55 2.7.6 Variable GOP Length 56 2.7.7 Random Access of MPEG Frames 56 2.8 Motion Estimation and Compensation 57 2.8.1 Motion Estimation 57 2.8.2 Motion Search in P Frames 58 2.8.3 Motion Search in B Frames 58 2.8.4 Fractional (Subsample) Motion Search 59 2.8.5 Motion Compensation 60 2.8.6 Computational Complexity 61 2.8.7 Motion Search Algorithms 63 2.8.8 Accelerating Motion Search 65 2.8.9 Impact of Video Resolution 66 2.9 Non-MPEG Video Coding 66 2.9.1 Motion JPEG 66 2.9.2 Dirac 67 2.9.3 WebM Project 67 2.10 Constant and Variable Bit-Rate Videos 67 2.10.1 CBR Encoding 68 2.10.2 VBR Encoding 68 2.10.3 Assessing Bit Rate Variability 69 2.10.4 Scene Change Detection 70 2.10.5 Adaptive Scene Change Detection 71 2.10.6 I Frame Size Prediction 72 2.11 Advanced Audio Coding 72 2.11.1 Low and High Bit Rate AAC 74 2.11.2 High-Efficiency and Low-Complexity AAC 74 2.11.3 MPEG Surround 74 2.12 Video Containers 74 2.12.1 MPEG-4 75 2.12.2 MP4 Access Units 75 2.12.3 Binary Format for Scenes 75 2.12.4 MP4 Overheads 76 2.12.5 MPEG-2 TS 76 2.12.6 MPEG-2 TS Structure 76 2.12.7 MPEG-2 TS Audio and Video PESs 77 2.12.8 MPEG-2 TS IP/Ethernet Encapsulation 77 2.13 CLOSED CAPTIONS 77 References 78 Homework Problems 78 3 H.264/AVC Standard 83 3.1 Overview of H.264 83 3.1.1 Fundamental H.264 Benefits 84 3.1.2 H.264 Applications 84 3.2 H.264 Syntax and Semantics 84 3.2.1 Profiles and Levels 85 3.2.2 Baseline, Extended, Main Profiles 85 3.2.3 High Profiles 85 3.3 H.264 Encoder 89 3.3.1 H.264 Slice Types 89 3.3.2 H.264 Intraprediction 90 3.3.3 Intraprediction for 4 x 4 Blocks 91 3.3.4 Intraprediction for 16 x 16 Macroblocks 92 3.3.5 Intra Pulse Code Modulation Mode 93 3.3.6 H.264 Interprediction 93 3.4 Rate Distortion Optimization 94 3.4.1 RDO under VBR 95 3.4.2 RDO under CBR 95 3.4.3 In-Loop Deblocking Filter 96 3.5 Video Coding and Network Abstraction Layers 96 3.5.1 Video Coding Layer 96 3.5.2 Network Abstraction Layer 97 3.5.3 Hypothetical Reference Decoder 97 3.5.4 Supplemental Enhancement Information 98 3.6 Error Resilience 98 3.6.1 Slice Coding 98 3.6.2 Data Partitioning 99 3.6.3 Slice Groups 100 3.6.4 Redundant Slices 101 3.6.5 Flexible Macroblock Ordering 101 3.6.6 FMO Types 102 3.6.7 FMO Overhead 103 3.6.8 Arbitrary Slice Ordering 103 3.7 Transform Coding 104 3.7.1 Transform Types 104 3.7.2 Hadamard Transforms 105 3.7.3 Transform Implementation 106 3.8 Entropy Coding 106 3.8.1 Context-Adaptive Binary Arithmetic Coding 106 3.8.2 CABAC Performance 107 3.8.3 Context-Adaptive Variable-Length Coding 107 3.9 Motion Vector Search 108 3.9.1 Motion Search Options 108 3.10 Multiple Reference Slices 109 3.10.1 Motivations for Using More Reference Slices 109 3.10.2 Switching Reference Slices 109 3.11 Scalable Video Coding 109 3.11.1 Temporal Scalability 110 3.11.2 Spatial Scalability 110 3.11.3 Video Quality Scalability 110 3.11.4 Disadvantages of SVC 110 References 111 Homework Problems 111 4 H.265/HEVC Standard 115 4.1 H.265 Overview 115 4.1.1 Fundamental H.265 Benefits 116 4.1.2 H.265 Applications 118 4.1.3 Video Input 118 4.2 H.265 Syntax and Semantics 118 4.2.1 Parameter Set Structure 119 4.2.2 NAL Unit Syntax Structure 119 4.2.3 Reference Frame Sets and Lists 119 4.2.4 H.265 GOP Structure 120 4.2.5 Support for Open GOPs and Random Access 121 4.2.6 Video Coding Layer 122 4.2.7 Temporal Sublayers 122 4.2.8 Error Resilience 123 4.2.9 RTP Support 124 4.3 Profiles, Levels, and Tiers 124 4.3.1 Profiles 124 4.3.2 Levels 125 4.3.3 Range Extensions 126 4.4 Quadtrees 126 4.4.1 Variable Block Size Quadtree Partitioning 127 4.4.2 Coding Tree Units 128 4.4.3 Splitting of Coding Blocks 129 4.4.4 Frame Boundary Matching 130 4.4.5 Prediction Blocks and Units 130 4.4.6 Transform Blocks and Units 132 4.4.7 Determining the Quadtree Depth 132 4.4.8 Coding Unit Identification 133 4.5 Slices 133 4.5.1 Tiles 134 4.5.2 Dependent Slice Segments 135 4.5.3 Wavefront Parallel Processing 136 4.5.4 Practical Considerations for Parallel Processing 137 4.6 Intraprediction 137 4.6.1 Prediction Block Partitioning 138 4.6.2 Intra-Angular Prediction 138 4.6.3 Intra-DC and Intra-Planar Prediction 140 4.6.4 Adaptive Smoothing of Reference Samples 140 4.6.5 Filtering of Prediction Block Boundary Samples 141 4.6.6 Reference Sample Substitution 141 4.6.7 Mode Coding 142 4.7 Interprediction 143 4.7.1 Fractional Sample Interpolation 143 4.7.2 Motion Vector Prediction 145 4.7.3 Merge Mode 146 4.7.4 Skip Mode 147 4.7.5 Advanced MV Prediction 148 4.7.6 Restrictions on Motion Data 148 4.7.7 Practical Considerations 149 4.8 Transform, Scaling, and Quantization 149 4.8.1 Alternative 4 x 4 Transform 150 4.8.2 Scaling 151 4.8.3 Quantization 151 4.9 Entropy Encoding 151 4.9.1 H.265 Binarization Formats 152 4.9.2 Context Modeling 152 4.9.3 CABAC Throughput Issues 154 4.9.4 CABAC Encoding 154 4.9.5 CABAC Decoding 155 4.9.6 Coefficient Scanning 155 4.9.7 Coefficient Coding 156 4.10 In-Loop Filters 156 4.10.1 In-Loop Deblocking Filter 157 4.10.2 Sample-Adaptive Offset Filter 158 4.11 Special H.265 Coding Modes 161 References 162 Homework Problems 162 5 Assessing and Enhancing Video Quality 165 5.1 Introduction 165 5.1.1 Subjective Metrics 166 5.1.2 Limitations of Subjective Metrics 166 5.1.3 Objective Metrics 166 5.1.4 Types of Objective Metrics 167 5.1.5 References for Objective Metrics 167 5.1.6 Network Impact 168 5.2 Distortion Measure 169 5.2.1 Sum of Absolute Differences 169 5.2.2 Sum of Absolute Transformed Differences 169 5.3 Peak Signal to Noise Ratio 170 5.3.1 Combined PSNR 170 5.3.2 Impact of Video Resolution and QP on PSNR 172 5.3.3 Limitations of PSNR 173 5.4 Structural Similarity Index 173 5.5 Observable Versus Perceptual Visual Artifacts 175 5.5.1 Limited Information Provided by PSNR 176 5.5.2 Observable Artifacts and Link Quality 176 5.5.3 Combined Spatial and Temporal Video Quality Assessment 176 5.6 Error Concealment 177 5.6.1 Error Resilience 177 5.6.2 Impact on Visual Artifacts 178 5.6.3 Types of Error Concealment 179 5.6.4 Comparison of EC Methods 179 5.6.5 Increasing Frame Rate Using EC 179 5.6.6 Actions Performed After EC 180 5.7 Color Science 180 5.7.1 Color Reception 180 5.7.2 Color Reproduction 180 References 181 Homework Problems 181 6 Coding Performance of H.262, H.264, and H.265 183 6.1 Coding Parameters 184 6.1.1 Coding Block Size 184 6.1.2 Transform Block Size 187 6.1.3 TMVP, SAO, AMP 188 6.2 Comparison of H.265 And H.264 189 6.2.1 Absolute Coding Efficiency 189 6.2.2 Relative Coding Gain 190 6.2.3 Videos with Different Levels of Motion 191 6.3 Frame Coding Comparison 192 6.3.1 I Frame Coding Efficiency, Quality, and Time 193 6.3.2 P Frame Coding Efficiency, Quality, and Time 195 6.3.3 B Frame Coding Efficiency, Quality, and Time 197 6.3.4 Overall Frame Coding Efficiency, Quality, and Time 199 6.4 Impact of Coding Block Size on Frame Coding Efficiency 201 6.4.1 Impact of Transform Block Size on Frame Coding Efficiency 201 6.4.2 Impact of Coding Block Size on Frame Encoding Time 203 6.4.3 Impact of Transform Block Size on Frame Encoding Time 203 6.4.4 Impact of CU Size on Encoding Time 203 6.4.5 Decoding Time 205 6.5 Summary of Coding Performance 205 6.6 Error Resiliency Comparison of H.264 and H.265 205 6.6.1 H.264 Error Resiliency 208 6.6.2 H.265 Error Resiliency 212 6.7 H.264/H.265 Versus H.262 214 6.7.1 Performance Comparison 214 6.7.2 H.262 Frame Coding Efficiency 215 6.7.3 Impact of GOP Size 218 References 219 Homework Problems 219 7 3D Video Coding 221 7.1 Introduction 221 7.1.1 3D Video Transmission and Coding 222 7.1.2 View Multiplexing 222 7.1.3 View Expansion and Display 223 7.1.4 View Packing Methods 223 7.2 Multiview Coding 224 7.2.1 MVC Bitstream 224 7.2.2 2D to 3D Conversion 225 7.2.3 H.264 Multiview Coding Extension 225 7.2.4 MVC Inter-view Prediction 225 7.2.5 MVC Inter-view Reordering 227 7.2.6 MVC Profiles 227 7.2.7 Comparing MVC with 2D H.264 Video Coding 227 7.3 Correlation Between Left and Right Views in S3D VIDEOS 228 7.4 View Expansion Via Sample Interpolation 230 7.4.1 Impact of Sample Interpolation 230 7.4.2 Inter-view Versus Intraview Sample Interpolation 233 7.4.3 Interframe Versus Intraview Sample Interpolation 235 7.4.4 Impact of Quantization on Interpolated S3D Videos 235 7.5 Anaglyph 3D Generation 235 7.5.1 H.264 Coding Efficiency for Anaglyph Videos 238 7.5.2 Delta Analysis 239 7.5.3 Disparity Vector Generation 242 References 243 Homework Problems 244 8 Video Distribution and Streaming 245 8.1 Adaptive Video Streaming 246 8.1.1 Playlists and Bandwidth Estimation 247 8.1.2 Quality (Bitstream) Switching 247 8.2 Video Quality and Chunk Efficiency 248 8.2.1 Video Quality for Different VBR Chunk Durations 248 8.2.2 VBR Chunk Bit Rate Versus Chunk Duration 250 8.2.3 VBR Chunk Efficiency Versus Chunk Duration 250 8.2.4 Capped VBR Chunk Efficiency Versus Chunk Duration 252 8.2.5 CBR Chunk Efficiency Versus Chunk Duration 253 8.2.6 Instantaneous and Average Rates for Different Chunk Durations 254 8.3 Apple HLS 257 8.3.1 Overview of HLS Operation 257 8.3.2 GOP Structure 258 8.3.3 Super and Dynamic Playlists 259 8.3.4 Media Control 260 8.4 HLS Over 4G and 802.11 261 8.4.1 Startup Delay 261 8.4.2 Switching Quality Levels 263 8.4.3 One-Level Versus Unfragmented HLS 265 8.4.4 Multi-Level HLS 266 8.4.5 Duplicate Video Chunks with Audio 268 8.4.6 Duplicate Video Chunks 269 8.4.7 Duplicate Audio Chunks 271 8.4.8 Duplicate Chunk Suppression 272 8.4.9 Server-Based Chunk Suppression 272 8.4.10 Custom App Chunk Suppression 274 8.5 Impact of Varying Chunk Duration 274 8.5.1 Impact of Varying Quality Levels 276 8.5.2 Summary of HLS Performance 277 8.6 Microsoft Silverlight Smooth Streaming 280 8.6.1 Overview of MSS Operation 280 8.6.2 MSS Streaming over 802.11n and 802.16 281 8.6.3 802.16 MSS Streaming 283 8.6.4 802.11n MSS Streaming 284 8.6.5 Comparison of HLS and MSS Streaming 287 8.7 Traffic Rate Shaping 287 8.7.1 Impact of Shaping and Scene Complexity on Quality Switching 288 8.7.2 Impact of Shaping on Quality Switch Delay 290 8.7.3 Impact of Shaping on Playback Duration 291 8.7.4 Impact of Shaping on Start of Playback 291 8.7.5 Impact of Shaping and Scene Complexity on Duplicate Chunks 292 8.7.6 Impact of Unshaped Traffic on Quality Switching 293 8.8 Adobe HTTP Dynamic Streaming 294 8.9 MPEG-DASH (ISO/IEC 23009) 294 8.9.1 DASH Process 296 8.9.2 DASH Media Formats 296 8.9.3 DASH for HTML5 297 8.9.4 DASH Industry Forum 297 8.10 Aggregate Adaptive Stream Bandwidth Prediction 297 8.10.1 Permanence Time 298 8.10.2 Prediction Model Implementation 298 8.11 Limitations of Client-Based Adaptive Streaming 298 8.11.1 Limitations of Fixed-Size Chunks 300 8.11.2 Server-Based Adaptive Streaming 301 8.11.3 Linear Broadcast Systems 302 8.11.4 Adaptive Streaming and Scalable Video Coding 302 8.12 Tips for Efficient Adaptive Streaming 302 8.12.1 Quality Levels and Chunk Duration 302 8.12.2 Encoder Efficiency 303 8.12.3 Bit Rates of Quality Levels 303 8.12.4 Server Bandwidth Shaping 303 8.12.5 Server Bandwidth Estimation 304 8.12.6 Analyzing Network Congestion 304 References 305 Homework Problems 305 Glossary 311 Index 317