Discover the captivating world of computer vision and deep learning for autonomous driving with our comprehensive and in-depth guide. Immerse yourself in an in-depth exploration of cutting-edge topics, carefully crafted to engage tertiary students and ignite the curiosity of researchers and professionals in the field. From fundamental principles to practical applications, this comprehensive guide offers a gentle introduction, expert evaluations of state-of-the-art methods, and inspiring research directions. With a broad range of topics covered, it is also an invaluable resource for university programs offering computer vision and deep learning courses. This book provides clear and simplified algorithm descriptions, making it easy for beginners to understand the complex concepts. We also include carefully selected problems and examples to help reinforce your learning. Don't miss out on this essential guide to computer vision and deep learning for autonomous driving.
Author(s): Rui Fan; Sicen Guo; Mohammud Junaid Bocus
Series: Advances in Computer Vision and Pattern Recognition
Edition: 1
Publisher: Springer Nature Singapore
Year: 2023
Language: English
Pages: x; 387
City: Singapore
Tags: Computer Science; Robotics; Machine Learning; Image Processing and Computer Vision
Preface
Contents
1 In-Sensor Visual Devices for Perception and Inference
1.1 Introduction
1.2 In-Sensor Computing Devices
1.2.1 Architecture
1.2.2 Focal-Plane Sensor Processor (FPFS)
1.3 SCAMP-5d Vision System and Pixel Processor Array
1.3.1 Introduction
1.3.2 Algorithms and Applications
1.4 Eye-RIS
1.4.1 Introduction
1.4.2 Applications
1.5 Kovilta's KOVA1
1.5.1 Introduction
1.5.2 Applications
1.6 Aistorm Mantis2
1.7 Other In-Sensor Computing Devices
1.8 Conclusion
References
2 Environmental Perception Using Fish-Eye Cameras for Autonomous Driving
2.1 Introduction
2.2 Fish-Eye Image Datasets
2.2.1 Real Fish-Eye Datasets
2.2.2 Simulator-Based Virtual Fish-Eye Datasets
2.2.3 Fish-Eye Projection Model-Based Methods
2.2.4 Comparison Between Real and Virtual Datasets
2.3 Fish-Eye Camera Projection Principle
2.3.1 Four Classic Image Representation Models
2.3.2 Other Wide-Angle Camera Projection Principles
2.4 Semantic Understanding
2.4.1 Semantic Segmentation in Fish-Eye Images
2.4.2 Semantic Segmentation in Omnidirectional or Panoramic Images
2.4.3 Instance and Panoptic Segmentation
2.4.4 Analysis of Semantic Understanding
2.5 Object Detection
2.5.1 Different Image Representation Models
2.5.2 Two-Step Methods
2.5.3 One-Step Methods
2.5.4 Analysis of Object Detection
2.6 Summary and Prospects
References
3 Stereo Matching: Fundamentals, State-of-the-Art, and Existing Challenges
3.1 Introduction
3.2 One Camera
3.2.1 Perspective Camera Model
3.2.2 Intrinsic Matrix
3.3 Two Cameras
3.3.1 Geometry of Multiple Images
3.3.2 Stereopsis
3.4 Stereo Matching
3.4.1 Explicit Programming-Based Stereo Matching Algorithms
3.4.2 Machine Learning-Based Stereo Matching Algorithms
3.5 Disparity Confidence Measures
3.5.1 Cost-Based Confidence Measures
3.5.2 Disparity-Based Confidence Measures
3.5.3 Consistency-Based Confidence Measures
3.5.4 Image-Based Confidence Measures
3.6 Evaluation Metrics
3.7 Public Datasets and Benchamarks
3.7.1 Middlebury Benchmark
3.7.2 KITTI Benchmark
3.7.3 ETH3D Benchmark
3.8 Existing Challenges
3.8.1 Unsupervised Training
3.8.2 Domain Adaptation
3.8.3 Trade-Off Between Speed and Accuracy
3.8.4 Intractable Areas
3.9 Summary
A Lie Group
B Skew-Symmetric Matrix
References
4 Semantic Segmentation for Autonomous Driving
4.1 Introduction
4.2 State of The Art
4.2.1 Single-Modal Networks
4.2.2 Data-Fusion Networks
4.3 Public Datasets and Benchmarks
4.3.1 Public Datasets
4.3.2 Online Benchmarks
4.4 Evaluation Metrics
4.5 Specific Autonomous Driving Tasks
4.5.1 Freespace Detection
4.5.2 Road Defect Detection
4.5.3 Road Anomaly Detection
4.6 Existing Challenges
4.7 Conclusion
References
5 3D Object Detection in Autonomous Driving
5.1 Introduction
5.2 Background Concepts
5.2.1 Problem Definition and Assumptions
5.2.2 Sensors
5.2.3 Public Datasets
5.2.4 Evaluation Metrics
5.3 Camera-Based Methods
5.3.1 Result-Lifting Methods
5.3.2 Feature-Lifting Methods
5.3.3 Summary
5.4 LiDAR-Based Methods
5.4.1 Quantization+CNN-Based Methods
5.4.2 Point-Based Methods
5.4.3 Point-Voxel-Based Methods
5.4.4 Summary
5.5 RADAR-Based Methods
5.6 Multi-sensor-fusion Methods
5.6.1 Feature Fusion
5.6.2 Transformer Interaction Fusion
5.6.3 Cascade Pipeline
5.6.4 Section Summary
5.7 Promising Directions
5.7.1 Extreme Conditions
5.7.2 Uncertainty
5.7.3 Summary
5.8 Conclusion
References
6 Collaborative 3D Object Detection
6.1 Introduction
6.1.1 Significance
6.1.2 Relations to Related Topics
6.1.3 Category of Collaborative 3D Object Detection
6.2 Key Challenges
6.2.1 Communication Constraints
6.2.2 Pose Errors
6.3 Communication-Efficient Collaborative 3D Object Detection
6.3.1 Problem Formulation
6.3.2 Mathematical Intuition
6.3.3 System Design
6.3.4 Experimental Results
6.3.5 Ablation Studies
6.3.6 Further Thoughts on Communication Efficiency
6.4 Chapter at a Glance
References
7 Enabling Robust SLAM for Mobile Robots with Sensor Fusion
7.1 Introduction
7.1.1 Background
7.1.2 Summary of this Chapter
7.1.3 Organization
7.2 Sensors
7.2.1 Interoceptive Sensors
7.2.2 Exteroceptive Sensors
7.3 SLAM
7.3.1 Architecture of SLAM
7.3.2 Challenges of SLAM
7.3.3 Modern SLAM Systems
7.4 Application
7.4.1 Motivation
7.4.2 System Overview
7.4.3 Sensor Calibration
7.4.4 Dataset Description
7.4.5 Evaluation
7.5 Conclusion
References
8 Visual SLAM for Texture-Less Environment
8.1 Introduction
8.2 State-of-the-Art Visual Simultaneous Localization and Mapping Algorithms
8.2.1 Lines and Planes
8.2.2 Objects
8.3 The VSLAM Dataset
8.3.1 Real-World Recorded Datasets
8.3.2 Computer Graphic Based Methods
8.4 Pipeline of the VSLAM Algorithms
8.4.1 Details of Datasets
8.4.2 Visual SLAM Based on Keyobjects
8.4.3 Keyobject Parameterization
8.5 Experimental Results
8.5.1 Evaluation of VSLAM Algorithms
8.5.2 Experiments on Synthetic and Real-World Datasets
8.6 Conclusion
8.7 Discussion
References
9 Multi-task Perception for Autonomous Driving
9.1 Introduction
9.1.1 2D Perception
9.1.2 3D Perception
9.2 Related Work
9.2.1 Visual Perception for Autonomous Driving
9.2.2 Multi-task Learning
9.2.3 Multimodal Learning
9.2.4 Pre-training Methods
9.2.5 Prompt-Based Learning
9.3 2D Perception
9.3.1 Empirical Study
9.3.2 Dataset
9.3.3 Pretrain-Finetune for Multi-task Learning
9.3.4 Effective Adaptation for Multi-task Learning
9.3.5 GT-Prompt
9.4 3D Perception
9.4.1 Dataset
9.4.2 Fuller
9.5 Experiments
9.5.1 2D Perception
9.5.2 3D Perception
9.6 Conclusion
9.6.1 2D Perception
9.6.2 3D Perception
References
10 Bird's Eye View Perception for Autonomous Driving
10.1 Introduction
10.2 BEV Fundamentals
10.2.1 Perspective Mapping
10.2.2 Inverse Perspective Mapping
10.3 LiDAR-Based BEV Perception
10.3.1 Pre-BEV Methods
10.3.2 Post-BEV Methods
10.4 Camera-Based BEV Perception
10.4.1 2D-3D Methods
10.4.2 3D-2D Methods
10.5 Fusion-Based BEV Perception
10.5.1 Multi-modal Fusion
10.5.2 Temporal Fusion
10.6 Datasets
10.7 Evaluation Metrics
10.8 Industrial Applications
10.8.1 Data Preprocessing
10.8.2 Feature Extraction
10.8.3 PV-BEV Transformation and Fusion
10.8.4 Perception Heads
10.9 Existing Challenges
10.10 Conclusions
References
11 Road Environment Perception for Safe and Comfortable Driving
11.1 Introduction
11.2 Sensing Technologies for Road Environment Perception
11.2.1 Vision Sensors
11.2.2 Vibration Sensors
11.3 Public Datasets
11.3.1 Road Imaging Datasets
11.3.2 Pothole Datasets
11.3.3 Crack Datasets
11.4 Road Defect Detection
11.4.1 Computer Vision-Based Road Defect Detection
11.4.2 Vibration-Based Road Defect Detection
11.5 Planning and Control
11.6 Existing Challenges and Future Insights
11.7 Conclusion
References