A Selection of Image Understanding Techniques: From Fundamentals to Research Front

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book offers a comprehensive introduction to seven commonly used image understanding techniques in modern information technology. Readers of various levels can find suitable techniques to solve their practical problems and discover the latest development in these specific domains.

The techniques covered include camera model and calibration, stereo vision, generalized matching, scene analysis and semantic interpretation, multi-sensor image information fusion, content-based visual information retrieval, and understanding spatial-temporal behavior. The book provides aspects from the essential concepts overview and basic principles to detailed introduction, explanation of the current methods and their practical techniques. It also presents discussions on the research trends and latest results in conjunction with new development of technical methods.

This is an excellent read for those who do not have a subject background in image technology but need to use these techniques to complete specific tasks. These essential information will also be useful for their further study in the relevant fields.

Author(s): Yu-Jin Zhang
Publisher: CRC Press
Year: 2023

Language: English
Pages: 348
City: Boca Raton

Cover
Half Title
Title Page
Copyright Page
Table of Contents
Preface
Chapter 1: Introduction
1.1 Image Engineering and Its Development
1.1.1 Basic Concepts and Overall Framework
1.1.2 Review of the Development of Image Technology
1.1.2.1 A Closed Survey Series of Image Technology
1.1.2.2 Image Engineering Survey Series in Progress
1.2 Image Understanding and Related Disciplines
1.2.1 Image Understanding
1.2.2 Computer Vision
1.2.2.1 Research Methods
1.2.2.2 Realization of Engineering Methods
1.2.2.3 Research Objectives
1.2.2.4 The Relationship between Image Understanding and Computer Vision
1.2.3 Other Related Disciplines
1.2.3.1 Artificial Intelligence
1.2.3.2 Machine Learning and Deep Learning
1.2.3.3 Machine Vision/Robot Vision
1.2.3.4 Pattern Recognition
1.2.3.5 Computer Graphics
1.3 The Theoretical Framework of Image Understanding
1.3.1 Marr’s Theory of Visual Computation
1.3.1.1 Vision is a Complex Information Processing Process
1.3.1.2 Three Key Elements of Visual Information Processing
1.3.1.3 Three-Level Internal Representation of Visual Information
1.3.1.4 Visual Information Understanding is Organized in the Form of Functional Modules
1.3.1.5 The Formal Representation of Computational Theory Must Take Constraints into Account
1.3.2 Improvements to Marr’s Theoretical Framework
1.3.3 Discussion on Marr’s Reconstruction Theory
1.3.3.1 Problems Related to Reconstruction Theory
1.3.3.2 Representation Without Reconstruction
1.3.4 Research on the New Theoretical Framework
1.3.4.1 Knowledge-Based Theoretical Framework
1.3.4.2 Active Vision Theory Framework
1.4 Characteristics of This Book
1.4.1 Writing Motivation
1.4.2 Material Selection and Contents
1.4.3 Structure and Arrangement
References
Chapter 2: Camera Model and Calibration
2.1 Linear Camera Model
2.1.1 Imaging Transformation
2.1.1.1 Various Coordinate Systems
2.1.1.2 Imaging Model
2.1.1.3 Perspective Transformation
2.1.1.4 Telecentric Imaging and Supercentric Imaging
2.1.1.5 Homogeneous Coordinates
2.1.1.6 Inverse Perspective Transformation
2.1.2 Approximate Projection Modes
2.1.2.1 Orthogonal Projection
2.1.2.2 Weak Perspective Projection
2.1.2.3 Parallel Perspective Projection
2.1.2.4 Comparison of Various Approximate Modes and Perspective Projection
2.1.3 A General Camera Model
2.2 Nonlinear Camera Model
2.2.1 Type of Distortion
2.2.1.1 Radial Distortion
2.2.1.2 Tangential Distortion
2.2.1.3 Eccentric Distortion
2.2.1.4 Thin Prism Distortion
2.2.2 A Complete Imaging Model
2.3 Camera Calibration
2.3.1 Basic Calibration Procedure
2.3.2 Camera Internal and External Parameters
2.3.2.1 External Parameters
2.3.2.2 Internal Parameters
2.3.2.3 Another Description of Internal and External Parameters
2.3.3 Nonlinear Camera Calibration
2.3.4 Classification of Calibration Methods
2.4 Traditional Calibration Methods
2.4.1 Basic Steps and Parameters
2.4.2 Two-Stage Calibration Method
2.4.3 Precision Improvement
2.5 Self-Calibration Methods
2.5.1 Basic Idea
2.5.2 A Practical Method
2.6 Some Recent Developments and Further Research
2.6.1 Calibration of Structured Light Active Vision System
2.6.1.1 Projector Model and Calibration
2.6.1.2 Pattern Separation
2.6.1.3 Calculation of Homography Matrix
2.6.1.4 Calculation of Calibration Parameters
2.6.2 Online Camera External Parameter Calibration
2.6.2.1 Lane Line Detection and Data Screening
2.6.2.2 Optimizing Reprojection Error
References
Chapter 3: Stereo Vision
3.1 Depth Imaging and Depth Image
3.1.1 Depth Image and Grayscale Image
3.1.2 Intrinsic Image and Non-Intrinsic Image
3.1.3 Depth Imaging Modes
3.2 Binocular Imaging Modes
3.2.1 Binocular Horizontal Mode
3.2.1.1 Parallax and Depth
3.2.1.2 Angular Scanning Imaging
3.2.2 Binocular Convergence Horizontal Mode
3.2.2.1 Parallax and Depth
3.2.2.2 Image Rectification
3.2.3 Binocular Axial Mode
3.3 Binocular Stereo Matching Based on Region
3.3.1 Template Matching
3.3.1.1 Basic Method
3.3.1.2 Using Geometric Hashing
3.3.2 Stereo Matching
3.3.2.1 Epipolar Line Constraint
3.3.2.2 Essential Matrix and Fundamental Matrix
3.3.2.3 Calculation of Optical Properties
3.4 Binocular Stereo Matching Based on Features
3.4.1 Basic Steps
3.4.1.1 Matching Using Edge Points
3.4.1.2 Matching Using Zero-Crossing Points
3.4.1.3 Feature Point Depth
3.4.1.4 Sparse Matching Points
3.4.2 Scale-Invariant Feature Transformation
3.4.3 Speed-Up Robust Feature
3.4.3.1 Determine the Point of Interest Based on the Hessian Matrix
3.4.3.2 Scale Space Representation
3.4.3.3 Description and Matching of Points of Interest
3.5 Some Recent Developments and Further Research
3.5.1 Biocular and Stereopsis
3.5.1.1 Biocular and Binocular
3.5.1.2 From Monocular to Binocular
3.5.2 Stereo Matching Methods Based on Deep Learning
3.5.2.1 Methods Using Image Pyramid Networks
3.5.2.2 Methods Using Siamese Networks
3.5.2.3 Methods Using Generative Adversarial Networks
3.5.3 Matching Based on Feature Cascade CNN
References
Chapter 4: Generalized Matching
4.1 Matching Overview
4.1.1 Matching Strategies and Categories
4.1.2 Matching and Registration
4.1.2.1 Registration Technology
4.1.2.2 Inertia-Equivalent Ellipse Matching
4.1.3 Matching Evaluation
4.2 Object Matching
4.2.1 Matching Metrics
4.2.1.1 Hausdorff Distance
4.2.1.2 Structural Matching Metrics
4.2.2 Corresponding Point Matching
4.2.3 String Matching
4.2.4 Shape Matrix Matching
4.3 Dynamic Pattern Matching
4.3.1 Matching Process
4.3.2 Absolute Pattern and Relative Pattern
4.4 Relationship Matching
4.4.1 Objects and Relational Representations
4.4.2 Connection Relationship Matching
4.4.3 Matching Process
4.5 Graph Isomorphism Matching
4.5.1 Introduction to Graph Theory
4.5.1.1 Basic Definitions
4.5.1.2 Geometric Representation of Graphs
4.5.1.3 Subgraphs
4.5.2 Graph Isomorphism and Matching
4.5.2.1 Identity and Isomorphism of Graphs
4.5.2.2 Judgment for Isomorphism
4.6 Some Recent Developments and Further Research
4.6.1 Image Registration and Matching
4.6.1.1 Heterogeneous Remote Sensing Image Registration Based on Feature Matching
4.6.1.2 Image Matching Based on Spatial Relation Reasoning
4.6.2 Multimodal Image Matching
4.6.2.1 Region-Based Techniques
4.6.2.2 Feature-Based Techniques
References
Chapter 5: Scene Analysis and Semantic Interpretation
5.1 Overview of Scene Understanding
5.1.1 Scenario Analysis
5.1.2 Scene Awareness Hierarchy
5.1.3 Scene Semantic Interpretation
5.2 Fuzzy Inference
5.2.1 Fuzzy Sets and Fuzzy Operations
5.2.2 Fuzzy Inference Methods
5.2.2.1 Basic Model
5.2.2.2 Fuzzy Combination
5.2.2.3 Defuzzification
5.3 Predicate Logical System
5.3.1 Predicate Calculus Rules
5.3.2 Inference by Theorem Proving
5.4 Scene Object Labeling
5.4.1 Labeling Methods and Key Elements
5.4.2 Discrete Relaxation Labeling
5.4.3 Probabilistic Relaxation Labels
5.5 Scene Classification
5.5.1 Bag of Words/Bag-of-Features Models
5.5.2 pLSA Model
5.5.2.1 Model Description
5.5.2.2 Model Calculation
5.5.2.3 Model Application Example
5.5.3 LDA Model
5.5.3.1 Basic LDA Model
5.5.3.2 SLDA Model
5.6 Some Recent Developments and Further Research
5.6.1 Interpretation of Remote Sensing Images
5.6.1.1 Classification of Remote Sensing Image Interpretation Methods
5.6.1.2 Knowledge Graph for Remote Sensing Image Interpretation
5.6.2 Hybrid Enhanced Visual Cognition
5.6.2.1 From Computer Vision Perception to Computer Vision Cognition
5.6.2.2 Hybrid Enhanced Visual Cognition Related Technologies
References
Chapter 6: Multi-Sensor Image Information Fusion
6.1 Overview of Information Fusion
6.1.1 Multi-Sensor Information Fusion
6.1.2 Information Fusion Level
6.1.3 Active Vision and Active Fusion
6.2 Image Fusion
6.2.1 Main Steps of Image Fusion
6.2.1.1 Image Preprocessing
6.2.1.2 Image Registration
6.2.1.3 Image Information Fusion
6.2.2 Three Levels of Image Fusion
6.2.3 Evaluation of Image Fusion Effect
6.2.3.1 Subjective Evaluation
6.2.3.2 Objective Evaluation Based on Statistical Characteristics
6.2.3.3 Objective Evaluation Based on the Amount of Information
6.2.3.4 Evaluation According to the Purpose of Fusion
6.3 Pixel-Level Fusion Methods
6.3.1 Basic Fusion Methods
6.3.1.1 Weighted Average Fusion Method
6.3.1.2 Pyramid Fusion Method
6.3.1.3 Wavelet Transform Fusion Method
6.3.1.4 HSI Transform Fusion Method
6.3.1.5 PCA Transform Fusion Method
6.3.2 Combining Various Fusion Methods
6.3.2.1 Problems with a Single-Type Fusion Method
6.3.2.2 Fusion by Combining HSI Transform and Wavelet Transform
6.3.2.3 Fusion by Combining PCA Transform and Wavelet Transform
6.3.2.4 Performance of Combined Fusions
6.3.3 The Optimal Number of Decomposition Layers for Wavelet Fusion
6.3.4 Image Fusion Based on Compressed Sensing
6.3.5 Examples of Pixel-Level Fusion
6.3.5.1 Fusion of Different Exposure Images
6.3.5.2 Fusion of Different Focus Images
6.3.5.3 Fusion of Remote Sensing Images
6.3.5.4 Fusion of Visible Light Image and Infrared Image
6.3.5.5 Fusion of Visible Light Image and Millimeter-Wave Radar Image
6.3.5.6 Fusion of CT Image and PET Image
6.3.5.7 Fusion of Dual-Energy Transmission Image and Compton Backscatter Image
6.4 Feature-Level and Decision-Level Fusion Methods
6.4.1 Bayesian Method
6.4.2 Evidential Reasoning Method
6.5 Rough Set Theory in Decision-Level Fusion
6.5.1 Rough Set Definition
6.5.2 Rough Set Description
6.5.3 Fusion Based on Rough Sets
6.6 Some Recent Developments and Further Research
6.6.1 Spatial-Spectral Feature Extraction of Hyperspectral Images
6.6.1.1 Traditional Spatial-Spectral Feature Extraction Methods
6.6.1.2 Deep Learning-Based Methods of Extracting Spatial-Spectral Features
6.6.2 Multi-Source Remote Sensing Image Fusion
6.6.2.1 Nine Multi-Source Remote Sensing Data Sources
6.6.2.2 Multi-Source Remote Sensing Image Fusion Literature
6.6.2.3 Spatial-Spectral Fusion of Remote Sensing Images
6.6.2.4 Fusion with Deep Recurrent Residual Networks
References
Chapter 7: Content-Based Visual Information Retrieval
7.1 Principles of Image and Video Retrieval
7.1.1 Content-Based Retrieval
7.1.2 Achieving and Retrieval Flowchart
7.1.3 Multi-Level Content Representation
7.2 Matching and Retrieval of Visual Features
7.2.1 Color Feature Matching
7.2.1.1 Histogram Intersection Method
7.2.1.2 Distance Method
7.2.1.3 Central Moment Method
7.2.1.4 Reference Color Table Method
7.2.2 Texture Feature Calculation
7.2.3 Multi-Scale Shape Features
7.2.4 Retrieval with Composite Features
7.2.4.1 Combination of Color and Texture Features
7.2.4.2 Combining Color, SIFT and CNN Features
7.3 Video Retrieval Based on Motion Features
7.3.1 Global Motion Features
7.3.2 Local Motion Features
7.4 Video Program Retrieval
7.4.1 News Video Structuring
7.4.1.1 Features of News Video
7.4.1.2 Main Speaker Close-up Shot Detection
7.4.1.3 Clustering of Main Speaker Close-up Shots
7.4.1.4 Announcer Shot Extraction
7.4.2 Video Ranking of Sports Games
7.4.2.1 Features of Sports Video
7.4.2.2 Structure of Table Tennis Competition Program
7.4.2.3 Object Detection and Tracking
7.4.2.4 Make the Brilliance Ranking
7.5 Semantic Classification Retrieval
7.5.1 Image Classification Based on Visual Keywords
7.5.1.1 Feature Selection
7.5.1.2 Image Classification
7.5.2 High-Level Semantics and Atmosphere
7.5.2.1 Five Atmospheric Semantics
7.5.2.2 Classification of Atmosphere
7.6 Some Recent Developments and Further Research
7.6.1 Deep Learning-Based Cross-Modal Retrieval
7.6.2 Hashing in Image Retrieval
7.6.2.1 Supervised Hashing
7.6.2.2 Asymmetric Supervised Deep Discrete Hashing
7.6.2.3 Hashing in Cross-Modal Image Retrieval
References
Chapter 8: Understanding Spatial-Temporal Behavior
8.1 Spatial-Temporal Technology
8.2 Spatial-Temporal Points of Interest
8.2.1 Detection of Spatial Points of Interest
8.2.2 Detection of Spatial-Temporal Points of Interest
8.3 Dynamic Trajectory Learning and Analysis
8.3.1 Overall Process
8.3.2 Automatic Scene Modeling
8.3.2.1 Object Tracking
8.3.2.2 Interest Point Detection
8.3.2.3 Activity Path Learning
8.3.3 Automated Activity Analysis
8.4 Action Classification and Recognition
8.4.1 Action Classification
8.4.1.1 Direct Classification
8.4.1.2 Time-State Model
8.4.1.3 Action Detection
8.4.2 Action Recognition
8.4.2.1 Holistic Recognition
8.4.2.2 Pose Modeling
8.4.2.3 Active Reconstruction
8.4.2.4 Interactive Activities
8.4.2.5 Group Activities
8.4.2.6 Scene Interpretation
8.5 Activity and Behavior Modeling
8.5.1 Action Modeling
8.5.1.1 Non-Parametric Modeling Methods
8.5.1.2 3-D Modeling Methods
8.5.1.3 Parametric Time-Series Modeling Methods
8.5.2 Activity Modeling and Recognition
8.5.2.1 Graph Model
8.5.2.2 Synthesis Methods
8.5.2.3 Knowledge- and Logic-Based Methods
8.6 Joint Modeling of Actor and Action
8.6.1 Single-Label Actor-Action Recognition
8.6.2 Multi-Label Actor-Action Recognition
8.6.3 Actor-Action Semantic Segmentation
8.7 Some Recent Developments and Further Research
8.7.1 Behavior Recognition Using Joints
8.7.1.1 Using CNN as Backbone
8.7.1.2 Using RNN as Backbone
8.7.1.3 Using GCN as Backbone
8.7.1.4 Using Hybrid Network as Backbone
8.7.2 Detection of Video Anomalous Events
8.7.2.1 Detection with Convolutional Auto-Encoder Block Learning
8.7.2.2 Detection Using One-Class Neural Network
References
Index