Immersive Video Technologies

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Get a broad overview of the different modalities of immersive video technologies―from omnidirectional video to light fields and volumetric video―from a multimedia processing perspective.

From capture to representation, coding, and display, video technologies have been evolving significantly and in many different directions over the last few decades, with the ultimate goal of providing a truly immersive experience to users. After setting up a common background for these technologies, based on the plenoptic function theoretical concept, Immersive Video Technologies offers a comprehensive overview of the leading technologies enabling visual immersion, including omnidirectional (360 degrees) video, light fields, and volumetric video. Following the critical components of the typical content production and delivery pipeline, the book presents acquisition, representation, coding, rendering, and quality assessment approaches for each immersive video modality. The text also reviews current standardization efforts and explores new research directions.

With this book the reader will a) gain a broad understanding of immersive video technologies that use three different modalities: omnidirectional video, light fields, and volumetric video; b) learn about the most recent scientific results in the field, including the recent learning-based methodologies; and c) understand the challenges and perspectives for immersive video technologies.

Author(s): Giuseppe Valenzise, Martin Alain, Emin Zerman, Cagri Ozcinar
Publisher: Academic Press
Year: 2022

Language: English
Pages: 684
City: London

Front Cover
Immersive Video Technologies
Copyright
Contents
List of contributors
Preface
Part 1 Foundations
1 Introduction to immersive video technologies
1.1 Introduction
1.2 What is immersion?
1.2.1 Definitions
1.2.2 Immersion
1.2.3 Extended reality
1.3 Immersive video
1.3.1 Foundations: the plenoptic function
1.3.2 Historical perspective and evolution
1.3.3 Imaging modalities
1.3.4 Omnidirectional imaging
1.3.5 Light field imaging
1.3.6 Volumetric 3D imaging
1.4 Challenges in the processing and content delivery pipeline
1.4.1 Acquisition and representations
1.4.1.1 Acquisition of 2D images and videos
1.4.1.2 Acquisition of immersive imaging data
1.4.1.3 From raw data to content creation
1.4.2 Compression and transmission
1.4.2.1 Compression of immersive videos
1.4.2.2 Streaming of immersive videos
1.4.2.3 Challenges for immersive video
1.4.3 Rendering and display
1.4.4 Quality assessment
1.4.4.1 Subjective quality assessment
1.4.4.2 Objective quality metrics
1.4.4.3 Challenges and limitations for immersive video technologies
1.5 Discussion and perspectives
References
Part 2 Omnidirectional video
2 Acquisition, representation, and rendering of omnidirectional videos
2.1 Introduction
2.2 Acquisition and projection model
2.2.1 Acquisition general principles
2.2.2 Catadioptric capture
2.2.2.1 Hyper-catadioptric
2.2.2.2 Para-catadioptric
2.2.3 Fish-eye capture
2.2.4 Unified spherical model
2.2.5 Stitching for sphere construction
2.3 Sphere representation
2.3.1 Equirectangular projection
2.3.2 CubeMap projection
2.3.3 Other mapping
2.3.4 On-the-sphere representation
2.4 Rendering
2.5 Towards multi-view acquisition
Acknowledgment
References
3 Streaming and user behavior in omnidirectional videos
3.1 Introduction
3.2 Streaming pipeline: evolution towards ODV
3.2.1 Adaptive ODV streaming pipeline
3.3 System-centric streaming
3.3.1 Viewport-independent streaming
3.3.2 Viewport-dependent streaming
Projection-based approach
Tiled-based approach
3.4 The role of the user in ODV
3.4.1 ODV navigation datasets
3.4.2 Behavioral analysis within ODVs
Traditional data analysis
Trajectory-based data analysis
3.5 User-centric ODV streaming
3.5.1 Single-user designs
3.5.2 Cross-user designs
Content-agnostic designs
Content-aware designs
3.6 Conclusions and perspectives
References
4 Subjective and objective quality assessment for omnidirectional video
4.1 Introduction
4.2 Subjective quality assessment
4.2.1 Test environment
4.2.2 Descriptions of test methods
4.2.3 Collection of users' ratings
4.3 Short-term video assessment
4.3.1 Video-related factors
4.3.2 Test methods
4.3.3 Head-rotation data
4.3.4 Other factors
4.4 Long-term video assessment
4.4.1 Review papers
4.5 Tile-based streaming
4.6 Objective quality assessment
4.6.1 Introduction
4.6.2 Metrics overview
4.6.2.1 Traditional video metrics
4.6.2.2 Omnidirectional video metrics based on traditional techniques
4.6.2.3 Omnidirectional video metrics based on deep learning
4.7 VIVA-Q: omnidirectional video quality assessment based on planar Voronoi patches and visual attention
4.7.1 Planar Voronoi patches
4.7.2 Voronoi-based framework without visual attention
4.7.3 Voronoi-based framework with visual attention
4.7.4 ODV dataset
4.7.5 Comparison study
4.8 Conclusion
References
5 Omnidirectional video saliency
5.1 Introduction
5.2 Collecting user data
5.3 User behavior analysis
5.4 Modeling
5.4.1 Image-based saliency models
5.4.2 Video-based saliency models
5.4.3 Viewport predictions
5.5 Metrics
5.5.1 2D images/videos saliency
5.5.1.1 Value-based metrics
5.5.1.2 Distribution-based metrics
5.5.1.3 Location-based metrics
5.5.2 2D images/videos scanpath
5.5.3 Extensions for omnidirectional content
5.5.3.1 Equirectangular projection
5.5.3.2 Cubic projection
5.6 Conclusion and outlook
References
Part 3 Light fields
6 Acquisition of light field images & videos
6.1 Plenoptic cameras
6.1.1 Plenoptic1.0
6.1.2 Plenoptic2.0
6.1.3 Kaleidoscope
6.2 Light field gantries
6.3 Camera arrays
References
7 Light field representation
7.1 Space domain representation
7.1.1 Two-plane model
7.1.1.1 Ray sets
7.1.2 Spatio-/angular representation
7.1.3 Epipolar plane images
7.1.4 Adding time
7.2 Frequency domain representation
7.2.1 4D Fourier transform
7.2.2 Fourier disparity layers
7.2.2.1 Fourier disparity layer rendering
7.2.2.2 Layer construction in the Fourier domain
7.2.2.3 Applications
7.2.2.4 Limitations
7.3 Depth-based representation
7.3.1 Multi-plane images
7.3.2 Multi-sphere images
7.3.3 Fristograms
7.3.3.1 Theory
7.3.3.2 Semantic analysis methodology
7.3.3.3 Applications
7.4 Alternative representations
7.4.1 Neural radiance fields
7.4.1.1 Positional encoding
References
8 Compression of light fields
8.1 Introduction
8.2 Transform-based methods
8.2.1 Approaches based on DCT
8.2.2 Approaches based on KLT
8.2.3 Approaches based on DWT
8.2.4 Summary
8.3 Prediction-based methods
8.3.1 Inter-view prediction
8.3.2 Non-local spatial prediction
8.3.3 View synthesis-based prediction
8.3.4 Summary
8.4 JPEG Pleno
8.4.1 Light field coding
4D transform mode
4D prediction mode
4DTM or 4DPM
8.5 Deep learning-based methods
Transformation-based
Non-local spatial prediction
View synthesis-based prediction
8.6 Performance analysis
8.7 Conclusions and perspectives
References
9 Light field processing for media applications
9.1 Light field processing chain overview
9.2 Image acquisition and geometric camera calibration
9.3 Depth reconstruction
9.4 Interactive light field processing
9.5 Light field rendering
9.5.1 Camera selection and blending
9.5.2 Observation space for planar arrays
9.6 Real-time rendering in game engines
9.6.1 Pixel-based depth image-based rendering
9.6.2 Implementation in Unreal Engine
9.6.3 Mesh-based rendering for view-dependent effects
9.7 Neural rendering and 3D reconstruction
9.7.1 Neural network-based rendering for light field angular super-resolution
9.7.2 Neural implicit representations
Practical challenge 1: input data
Practical challenge 2: training
Practical challenge 3: rendering
Practical challenge 4: mesh extraction
9.7.3 Neural textures
9.8 Conclusion and open challenges
Acknowledgments
References
10 Quality evaluation of light fields
10.1 Introduction
10.2 Characteristics of LF-related distortions
10.2.1 Acquisition-related distortions
10.2.1.1 Multiplexed acquisition
10.2.1.2 Time-sequential acquisition
10.2.1.3 Multi-sensor acquisition
10.2.2 Processing related distortions
10.2.2.1 Spatial super resolution
10.2.2.2 Angular super resolution
10.2.2.3 Temporal super resolution
10.2.2.4 Depth estimation
10.2.3 Compression-related distortions
10.2.4 Use case specific influencing factors for LF distortions
10.3 Subjective quality assessment
10.3.1 Characterization of LF content for subjective assessment
10.3.2 Quality assessment on LF displays
10.3.2.1 IQA datasets with LF displays
10.3.3 Quality assessment on other displays
10.3.3.1 IQA datasets with other displays
10.3.3.2 Impact of visualization trajectory
10.4 Objective quality assessment
10.4.1 Visibility of LF-related distortions on EPI
10.4.1.1 Structural image quality metrics
10.4.1.2 Sub-aperture views vs EPI
10.4.2 LF image quality metrics
10.5 Conclusion
References
Part 4 Volumetric video
11 Volumetric video – acquisition, interaction, streaming and rendering
11.1 Creation of volumetric video
11.1.1 Volumetric video workflow
11.1.2 Calibration
11.1.3 Image pre-processing
11.1.4 Depth estimation
11.1.5 Depth map fusion
11.1.6 Mesh processing
11.1.7 Temporal mesh registration
11.1.8 Use case for volumetric video
11.2 Animating and interacting with volumetric video
11.2.1 Body animation
11.2.1.1 Body model fitting
11.2.1.2 Hybrid animation
11.2.1.3 Body pose interpolation
11.2.1.4 Pose-dependent neural synthesis
11.2.2 Face animation
11.2.3 Neural hybrid face model
11.2.4 Text driven facial animation
11.2.5 Neural face refinement
11.3 Streaming volumetric video
11.3.1 Compression and formats
11.3.2 Scene description
11.3.3 Remote-rendering for volumetric video
11.3.3.1 Server architecture
11.3.3.2 Client architecture
11.3.3.3 Motion-to-photon latency measurement
11.4 Conclusions
Acknowledgments
References
12 MPEG immersive video
12.1 MIV description and profiles
12.1.1 MIV overview
12.1.2 MIV profiles
12.2 TMIV description
12.2.1 TMIV encoder
12.2.2 TMIV decoder and renderer
12.3 Common test conditions
12.4 Evaluation of MIV with different configurations and 2D codecs
12.4.1 Performance of MIV anchors
12.4.2 Codec agnosticism
12.4.3 Atlas coding using SCC tools in VVC
12.4.3.1 Texture atlas coding
12.4.3.2 Depth atlas coding
12.5 Conclusion
References
13 Point cloud compression
13.1 Introduction
13.1.1 Challenges in point cloud compression
13.2 Basic tools for point cloud compression
13.2.1 2D projections and surface approximations
13.2.2 Voxelization
13.2.3 Tree-based partitioning and levels of detail (LoD)
13.2.4 Graph representations
13.2.5 Leveraging the acquisition model
13.3 MPEG PCC standardization
13.3.1 Video-based PCC
13.3.2 Geometry-based PCC
13.3.2.1 Geometry coding
Octree coding
Predictive coding
Arithmetic coding
13.3.2.2 Attribute coding
Region-adaptive hierarchical transform (RAHT)
Level of detail (LOD) generation
13.4 Learning-based techniques
13.4.1 Taxonomy
13.4.2 Deep geometry compression
13.4.3 Generative schemes for lossy compression
13.4.4 Point-based methods
13.4.5 Attribute compression
13.5 Conclusions and perspectives
References
14 Coding of dynamic 3D meshes
14.1 Introduction
14.2 Mesh fundamentals
14.2.1 Manifold vs non-manifold meshes
14.2.2 Meshes with and without boundaries
14.2.3 Mesh genus
14.2.4 Types of connectivity in a mesh
14.2.5 Representing a mesh as a graph
14.2.6 Euler–Poincaré characteristic
14.2.7 Mesh data structures
14.3 Static meshes
14.3.1 Connectivity compression
14.3.2 Vertex (geometry) compression
14.3.3 Standards and software
14.4 Constant-connectivity mesh sequences
14.4.1 Tracking and re-meshing
14.4.2 Methods based on segmentation
14.4.3 Methods based on principal component analysis
14.4.4 Methods based on spatio-temporal prediction
14.4.5 Methods using wavelets
14.4.6 Methods based on surface unfolding
14.4.7 Methods based on spectral analysis
14.4.8 The MPEG framework
14.5 Variable-connectivity mesh sequences
14.5.1 Methods based on mesh surface unfolding
14.5.2 Methods based on subdivision of meshes into blocks
14.5.3 Methods inspired by MPEG V-PCC
14.5.4 The MPEG V-mesh call for proposals
14.6 Conclusion and future directions
References
15 Volumetric video streaming
15.1 Theoretical approaches to volumetric video streaming
15.1.1 Dynamic mesh streaming strategies
15.1.2 Dynamic point cloud streaming strategies
15.2 Volumetric video streaming systems
15.2.1 Mesh-based systems
15.2.2 Point cloud-based systems
15.3 Conclusion
References
16 Processing of volumetric video
16.1 Introduction
16.2 Restoration
16.2.1 Degradation model
16.2.2 Model-based methods
16.2.2.1 Non-local graph-based transform for depth image denoising
16.2.2.2 Graph Laplacian regularized point cloud denoising
16.2.3 Learning-based methods
16.2.3.1 Non-local spatial propagation network for depth completion
16.2.3.2 Gridding residual network for point cloud completion
16.3 Semantic segmentation
16.3.1 Point cloud representation
16.3.2 Projection-based methods
16.3.3 Voxel-based methods
16.3.4 Point-based methods
16.3.5 Multi-representation-based methods
16.4 Object detection
16.4.1 Single-stage detection
16.4.2 Two-stage detection
16.5 Chapter at a glance
References
17 Computational 3D displays
17.1 Introduction
17.1.1 Outline
17.2 Perceptual considerations
17.2.1 Spatial considerations
17.2.2 Spectral considerations
17.2.3 Temporal considerations
17.2.4 Intensity considerations
17.3 Computational 3D displays
17.3.1 Autostereoscopic displays
17.3.1.1 Holographic displays
17.3.1.2 Light field displays
17.3.1.3 Volumetric displays
17.3.2 Stereoscopic displays
17.3.2.1 Eye tracking
17.3.2.2 Varifocal displays
17.3.2.3 Multifocal displays
17.3.3 Optical see-through displays
17.3.3.1 OST-AR without background subtraction
17.3.3.2 OST-AR with background subtraction
17.3.3.3 High dynamic range OSTD
17.3.3.4 Calibration
17.3.4 Perception-driven rendering techniques
17.3.4.1 Foveated rendering
17.3.4.2 Synthetic defocus
17.4 Summary and discussion
18 Subjective and objective quality assessment for volumetric video
18.1 Subjective quality assessment
18.1.1 Non-interactive user studies
18.1.1.1 User studies for point clouds
18.1.1.2 User studies for meshes
18.1.2 Interactive user studies
18.1.2.1 User studies for point clouds
18.1.2.2 User studies for meshes
18.1.3 Publicly available datasets
18.1.4 Comparative studies
18.2 Objective quality assessment
18.2.1 Model-based quality metrics
18.2.1.1 For point clouds
18.2.1.2 For meshes
18.2.2 Image-based approaches
18.2.2.1 For point clouds
18.2.2.2 For meshes
18.2.3 Comparison between model-based and image-based approaches
18.2.4 Objective quality assessment in volumetric video
18.2.5 Publicly available software implementations
18.3 Conclusion
References
Part 5 Applications
19 MR in video guided liver surgery
19.1 Introduction
19.1.1 Liver surgery
19.1.2 Overview of planning and navigation in liver resection procedures
19.2 Medical images pre-processing
19.2.1 Medical image enhancement
19.2.2 Segmentation
19.3 3D liver modeling
19.4 Model-organ registration
19.5 Mixed reality guided surgery
19.5.1 Surgery planning
19.5.2 Navigation
19.6 Medical validation
19.7 Conclusion
References
20 Immersive media productions involving light fields and virtual production LED walls
20.1 Light fields in media production
20.1.1 The creation of ``Unfolding''
20.1.2 Post-production of Unfolding 1.0
20.1.3 Post-production of Unfolding 2.0
20.2 Immersive LED wall productions
20.2.1 LED cave production
20.2.2 Flat LED wall production
20.2.3 Summary
References
21 Volumetric video as a novel medium for creative storytelling
21.1 Volumetric video and creative practice
21.2 Case studies
21.2.1 MR play trilogy (2017)
21.2.2 Jonathan Swift at the Trinity library long room
21.2.3 Bridging the Blue
21.2.4 Image Technology Echoes
21.2.5 Mixed reality Ulysses
21.2.6 XR music videos
21.3 Reflection on V-SENSE practices
21.4 Conclusion
References
22 Social virtual reality (VR) applications and user experiences
22.1 Introduction
22.2 The social VR clinic
22.2.1 User journey
22.2.2 Design and implementation
22.2.3 Real-world deployment
22.3 CakeVR and the birthday celebration
22.3.1 User journey
22.3.2 Design and implementation
22.3.3 Real-world deployment
22.4 MediaScape XR: the social VR museum experience
22.4.1 User journey
22.4.2 Design and implementation
22.4.3 Real-world deployment
22.5 The social VR movie
22.5.1 The virtual movie production
22.5.2 Real-world deployment
22.6 Measuring user experiences in social VR
22.6.1 Developing a social VR questionnaire
22.6.2 Validating the social VR questionnaire
22.7 Discussion
22.7.1 Lessons learned and opportunities of social VR
22.7.1.1 Controlled experiments versus real-world products
22.7.1.2 Virtual representation and privacy
22.7.1.3 Social VR as an extension of 2D video conferencing
22.7.1.4 Opportunities for controlled experiments
22.7.1.5 Production opportunities for immersive narrative experiences
22.7.2 Design recommendations for social VR
22.7.2.1 Conveying emotions in social VR
22.7.2.2 Creative virtual environment
22.7.2.3 Recreating the senses in social VR
22.7.2.4 Depth of interaction and fatigue
22.7.2.5 Beyond reality experiences in social VR
22.8 Conclusion
References
Index
Back Cover