Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The 30-volume set, comprising the LNCS books 12346 until 12375, constitutes the refereed proceedings of the 16th European Conference on Computer Vision, ECCV 2020, which was planned to be held in Glasgow, UK, during August 23-28, 2020. The conference was held virtually due to the COVID-19 pandemic.

The 1360 revised papers presented in these proceedings were carefully reviewed and selected from a total of 5025 submissions. The papers deal with topics such as computer vision; machine learning; deep neural networks; reinforcement learning; object recognition; image classification; image processing; object detection; semantic segmentation; human pose estimation; 3d reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; object recognition; motion estimation.

 

 

Author(s): Andrea Vedaldi, Horst Bischof , Thomas Brox, Jan-Michael Frahm
Series: Lecture Notes in Computer Science, 12362
Publisher: Springer
Year: 2020

Language: English
Pages: 805
City: Cham

Foreword
Preface
Organization
Contents – Part XVII
Class-Wise Dynamic Graph Convolution for Semantic Segmentation
1 Introduction
2 Related Work
3 Approach
3.1 Preliminaries
3.2 Overall Framework
3.3 Class-Wise Dynamic Graph Convolution Module
3.4 Loss Function
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparisons with State-of-the-Arts
5 Conclusions
References
Character-Preserving Coherent Story Visualization
1 Introduction
2 Related Work
2.1 GAN-based Text-to-Image Synthesis
2.2 Evaluation Metrics of Image Generation
3 Character-Preserving Coherent Story Visualization
3.1 Overview
3.2 Story and Context Encoder
3.3 Figure-Ground Aware Generation
3.4 Loss Function
3.5 Fréchet Story Distance
4 Experimental Results
4.1 Implementation Details
4.2 Dataset
4.3 Baselines
4.4 Qualitative Comparison
4.5 Quantitative Comparison
4.6 Architecture Search
4.7 FSD Analysis
5 Conclusions
References
GINet: Graph Interaction Network for Scene Parsing
1 Introduction
2 Related Work
3 Approach
3.1 Framework of Graph Interaction Network (GINet)
3.2 Graph Interaction Unit
3.3 Semantic Context Loss
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Experiments on Pascal-Context
4.4 Experiments on COCO Stuff
4.5 Experiments on ADE20K
5 Conclusion
References
Tensor Low-Rank Reconstruction for Semantic Segmentation
1 Introduction
2 Related Work
3 Methodology
3.1 Overview
3.2 Tensor Generation Module
3.3 Tensor Reconstruction Module
3.4 Global Pooling Module
3.5 Network Details
3.6 Relation to Previous Approaches
4 Experiments
4.1 Implementation Details
4.2 Results on Different Datasets
4.3 Ablation Study
4.4 Further Discussion
5 Conclusion
References
Attentive Normalization
1 Introduction
2 Related Work
3 The Proposed Attentive Normalization
3.1 Background on Feature Normalization
3.2 Background on Feature Attention
3.3 Attentive Normalization
4 Experiments
4.1 Ablation Study
4.2 Image Classification in ImageNet-1000
4.3 Object Detection and Segmentation in COCO
5 Conclusion
References
Count- and Similarity-Aware R-CNN for Pedestrian Detection
1 Introduction
2 Related Work
3 Baseline Two-Stage Detection Framework
4 Our Approach
4.1 Detection Branch
4.2 Count-and-Similarity Branch
4.3 Inference
5 Experiments
5.1 Datasets and Evaluation Metrics
5.2 Implementation Details
5.3 CityPersons Dataset
5.4 CrowdHuman Dataset
5.5 Results on Person Instance Segmentation
6 Conclusion
References
TRADI: Tracking Deep Neural Network Weight Distributions
1 Introduction
2 TRAcking of the Weight DIstribution (TRADI)
2.1 Notations and Hypotheses
2.2 TRAcking of the DIstribution (TRADI) of Weights of a DNN
2.3 Training the DNNs
2.4 TRADI Training Algorithm Overview
2.5 TRADI Uncertainty During Testing
3 Related Work
4 Experiments
4.1 Toy Experiments
4.2 Regression Experiments
4.3 Classification Experiments
4.4 Uncertainty Evaluation for Out-of-Distribution (OOD) Test Samples
5 Conclusion
References
Spatiotemporal Attacks for Embodied Agents
1 Introduction
2 Related Work
3 Adversarial Attacks for the Embodiment
3.1 Motivations
3.2 Problem Definition
4 Spatiotemporal Attack Framework
4.1 Temporal Attention Stimulus
4.2 Spatially Contextual Perturbations
4.3 Optimization Formulations
5 Experiments
5.1 Experimental Setting
5.2 Evaluation Metrics
5.3 Implementation Details
5.4 Attack via a Differentiable Renderer
5.5 Transfer Attack onto a Non-differentiable Renderer
5.6 Generalization Ability of the Attack
5.7 Improving Agent Robustness with Adversarial Training
5.8 Ablation Study
6 Conclusion
References
Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model Without Manual Annotation
1 Introduction
2 Related Work
3 Methodology
4 Dataset
5 Experiments
5.1 Experiment Setting
5.2 Comparison to Fully Supervised Training
5.3 Comparison to SSL and MIL Methods
5.4 Ablation Study and Discussion
6 Conclusion
References
Unselfie: Translating Selfies to Neutral-Pose Portraits in the Wild
1 Introduction
2 Related Work
3 Our Method
3.1 Datasets
3.2 Nearest Pose Search
3.3 Coordinate-Based Inpainting
3.4 Composition
4 Experiments
4.1 Comparisons with Existing Methods
4.2 Ablation Study
4.3 Limitations
5 Conclusion
References
Design and Interpretation of Universal Adversarial Patches in Face Detection
1 Introduction
2 Related Work
3 Interpretation of Adversarial Patch as Face
3.1 Preliminaries on Face Detection
3.2 Design of Adversarial Patch
3.3 Generality
3.4 Interpretation of Adversarial Patch
4 Improved Optimization of Adversarial Patch
4.1 Evaluation Metric
4.2 Improved Optimization
4.3 Experimental Results
5 Conclusions
References
Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild
1 Introduction
2 Related Work
3 Approach
3.1 Few-Shot Learning Setup
3.2 Network Description
3.3 Learning Procedure
4 Experiments
4.1 Few-Shot Object Detection
4.2 Few-Shot Viewpoint Estimation
4.3 Evaluation of Joint Detection and Viewpoint Estimation
5 Conclusion
References
Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints
1 Introduction
2 Related Work
3 Method
3.1 Biomechanical Constraints
3.2 Zroot Refinement
3.3 Final Loss
4 Implementation
5 Evaluation
5.1 Datasets
5.2 Evaluation Metric
5.3 Effect of Weak-Supervision
5.4 Ablation Study
5.5 Bootstrapping with Synthetic Data
5.6 Bootstrapping with Real Data
6 Conclusion
References
Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification
1 Introduction
2 Related Work
3 Proposed Method
3.1 Baseline Cross-modality Re-ID
3.2 Intra-modality Weighted-Part Aggregation
3.3 Cross-modality Graph Structured Attention
3.4 Dynamic Dual Aggregation Learning
4 Experimental Results
4.1 Experimental Settings
4.2 Ablation Study
4.3 Comparison with State-of-the-Art Methods
5 Conclusion
References
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection
1 Introduction
2 Related Work
3 Approach
3.1 Preliminary
3.2 Pipeline
3.3 Contextual Learning
3.4 HOI Prediction
4 Experiments
4.1 Datasets and Metrics
4.2 Implementation Details
4.3 Ablation Studies
4.4 Performance and Comparison
5 Conclusions
References
Zero-Shot Image Super-Resolution with Depth Guided Internal Degradation Learning
1 Introduction
2 Related Work
3 Approach
3.1 Depth Guided Training Data Generation
3.2 Network Structure
3.3 Bi-cycle Training
4 Discussion
5 Experiment
5.1 Dataset and Training Setup
5.2 Comparison with the State of the Arts
5.3 Visual Comparison
5.4 Super-Resolving Image with Estimated Depth
5.5 Ablation Study
6 Conclusion
References
A Closest Point Proposalpg for MCMC-based Probabilistic Surface Registration
1 Introduction
2 Background
2.1 Gaussian Process Morphable Model (GPMM)
2.2 Analytical Posterior Model
3 Method
3.1 Approximating the Posterior Distribution
3.2 CP-proposal
4 Experiments
4.1 Convergence Comparison
4.2 Posterior Estimation of Missing Data
4.3 Registration Accuracy - ICP vs CPD vs CP-proposal
5 Conclusion
References
Interactive Video Object Segmentation Using Global and Local Transfer Modules
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 Network Architecture
3.2 Training Phase
3.3 Inference Phase
4 Experimental Results
4.1 Comparative Assessment
4.2 User Study
4.3 Ablation Studies
5 Conclusions
References
End-to-end Interpretable Learning of Non-blind Image Deblurring
1 Introduction
1.1 Related Work
1.2 Main Contributions
2 Proposed Method
2.1 A Convolutional HQS Algorithm
2.2 Convolutional PCR Iterations
2.3 An End-to-end Trainable CHQS Algorithm
3 Implementation and Results
3.1 Implementation Details
3.2 Experimental Validation of CPCR and CHQS
3.3 Uniform Deblurring
3.4 Non-uniform Motion Blur Removal
3.5 Deblurring with Approximated Blur Kernels
4 Conclusion
References
Employing Multi-estimations for Weakly-Supervised Semantic Segmentation
1 Introduction
2 Related Work
2.1 Semantic Segmentation
2.2 Weakly-Supervised Semantic Segmentation
2.3 Learning from Noisy Labels
3 Pilot Experiments
4 Approach
4.1 The Class Activation Map
4.2 Multi-type Seeds
4.3 Multi-scale Seeds
4.4 Multi-architecture Seeds
4.5 The Weighted Selective Training
5 Experiments
5.1 Dataset
5.2 Implementation Details
5.3 The Influence of Multiple Seeds
5.4 The Weighted Selective Training
5.5 Comparison with Related Works
6 Conclusions
References
Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection
1 Introduction
2 Related Work
3 Proposed Framework
3.1 Noise-Aware Encoder-Decoder Network
3.2 Maximum Likelihood via Alternating Back-Propagation
3.3 Comparison with Variational Inference
3.4 Network Architectural Design
4 Experiments
4.1 Experimental Setup
4.2 Comparison with the State-of-the-Art Methods
4.3 Ablation Study
4.4 Model Analysis
5 Conclusion
References
Rethinking Image Deraining via Rain Streaks and Vapors
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 SNet
3.2 VNet
3.3 ANet
3.4 Network Training
3.5 Visualizations
4 Experiments
4.1 Dataset Constructions
4.2 Ablation Studies
4.3 Evaluations with State-of-the-Art
5 Concluding Remarks
References
Finding Non-uniform Quantization Schemes Using Multi-task Gaussian Processes
1 Introduction
2 Related Work
3 Method
3.1 Constraining the Space
3.2 Exploring the Space
3.3 Sampling the Space
4 Experiments and Results
5 Conclusion
References
Is Sharing of Egocentric Video Giving Away Your Biometric Signature?
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Extracting Gait Signatures from Egocentric Videos
3.2 Recognizing Wearer from First Person Video
3.3 Extracting Gait from Sparse Optical Flow
3.4 Recognizing Wearer from Third Person Video
4 Datasets Used
5 Experiments and Results
5.1 Hyper-parameters and Ablation Study
5.2 Wearer Recognition in Egocentric Videos
5.3 Wearer Recognition in Third Person Videos
5.4 Model Interpretability
6 Conclusion and Future Work
References
Captioning Images Taken by People Who Are Blind
1 Introduction
2 Related Work
3 VizWiz-Captions
3.1 Dataset Creation
3.2 Dataset Analysis
4 Algorithm Benchmarking
5 Conclusions
References
Improving Semantic Segmentation via Decoupled Body and Edge Supervision
1 Introduction
2 Related Work
3 Method
3.1 Decoupled Segmentation Framework
3.2 Body Generation Module
3.3 Edge Preservation Module
3.4 Decoupled Body and Edge Supervision
3.5 Network Architecture
4 Experiment
4.1 Ablation Studies
4.2 Visual Analysis
4.3 Results on Other Datasets
5 Conclusions
References
Conditional Entropy Coding for Efficient Video Compression
1 Introduction
2 Background and Related Work
2.1 Deep Image Compression
2.2 Video Compression
2.3 Internal Learning
3 Entropy-Focused Video Compression
3.1 Single-Image Encoder/Decoder
3.2 Conditional Entropy Model for Video Encoding
3.3 Rate-distortion Loss Function
4 Internal Learning of the Frame Code
5 Experiments
5.1 Datasets, Metrics, and Video Codecs
5.2 Runtime and Rate-distortion on UVG
5.3 Rate-distortion on NorthAmerica
5.4 Varying Framerates on UVG and CDVL
5.5 Qualitative Results
6 Conclusion
References
Differentiable Feature Aggregation Search for Knowledge Distillation
1 Introduction
2 Related Work
3 Method
3.1 Feature Distillation
3.2 Differentiable Group-Wise Search
3.3 Time Complexity Analysis
3.4 Implementation Details
4 Experiments
4.1 CIFAR-100
4.2 CINIC-10
4.3 The Effectiveness of Differentiable Search
5 Conclusion
References
Attention Guided Anomaly Localization in Images
1 Introduction
2 Related Works
3 Proposed Approach: CAVGA
3.1 Unsupervised Approach: CAVGAu
3.2 Weakly Supervised Approach: CAVGAw
4 Experimental Setup
5 Experimental Results
6 Ablation Study
7 Conclusion
References
Self-supervised Video Representation Learning by Pace Prediction
1 Introduction
2 Related Work
3 Our Approach
3.1 Pace Prediction
3.2 Contrastive Learning
3.3 Network Architecture and Training
4 Experiments
4.1 Datasets and Implementation Details
4.2 Ablation Studies
4.3 Action Recognition
4.4 Video Retrieval
5 Conclusion
References
Full-Body Awareness from Partial Observations
1 Introduction
2 Related Work
3 Approach
3.1 Base Models
3.2 Iterative Adaptation to Partial Visibility
3.3 Implementation Details
4 Experiments
4.1 Datasets and Annotations
4.2 Experimental Setup
4.3 Results on VLOG
4.4 Generalization Evaluations
4.5 Additional Comparisons
5 Discussion
References
Reinforced Axial Refinement Network for Monocular 3D Object Detection
1 Introduction
2 Related Work
3 Approach
3.1 Baseline and the Curse of Sampling in 3D Space
3.2 Towards Higher Sampling Efficiency
3.3 Refining 3D Detection with Reinforcement Learning
3.4 Parameter-Aware Data Enhancement
3.5 Implementation Details
4 Experiments
4.1 Dataset and Evaluation
4.2 Comparison to the State-of-the-Arts
4.3 Diagnostic Studies
4.4 Computational Costs
5 Conclusions
References
Self-supervised Multi-task Procedure Learning from Instructional Videos
1 Introduction
1.1 Prior Work
1.2 Paper Contributions
2 Self-supervised Procedure Learning
2.1 Proposed Framework
2.2 Proposed Learning Method
3 Experiments
3.1 Experimental Setup
3.2 Experimental Results
4 Conclusions
References
CosyPose: Consistent Multi-view Multi-object 6D Pose Estimation
1 Introduction
2 Related Work
3 Multi-view Multi-object 6D Object Pose Estimation
3.1 Approach Overview
3.2 Stage 1: Object Candidate Generation
3.3 Stage 2: Object Candidate Matching
3.4 Stage 3: Scene Refinement
4 Results
4.1 Single-View Single-Object Experiments
4.2 Multi-view Experiments
5 Conclusion
References
In-Domain GAN Inversion for Real Image Editing
1 Introduction
1.1 Related Work
2 In-Domain GAN Inversion
2.1 Domain-Guided Encoder
2.2 Domain-Regularized Optimization
3 Experiments
3.1 Experimental Settings
3.2 Semantic Analysis of the Inverted Codes
3.3 Inversion Quality and Speed
3.4 Real Image Editing
3.5 Ablation Study
4 Discussion and Conclusion
References
Key Frame Proposal Network for Efficient Pose Estimation in Videos
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Atomic Dynamics-Based Representation of Temporal Data
3.2 Key Frame Selection Unsupervised Loss
3.3 Human Pose Interpolation
3.4 Architecture, Training, and Inference
3.5 Online Key Frame Detection
4 Experiments
4.1 Data Preprocessing and Evaluation Metrics
4.2 Qualitative Examples
4.3 Ablation Studies
4.4 Comparison Against the State-of-Art
4.5 Robustness of Our Approach
5 Conclusion
References
Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning
1 Introduction
2 Preliminaries: Set-to-Set Matching
2.1 Mappings of Exchangeability
3 Matching and Learning for Sets
3.1 Cross-Set Feature Transformation
3.2 Calculating Matching Score for Sets
3.3 Training for Pairs of Sets
4 Related Works
5 Experiments
5.1 Overall Architecture
5.2 Baselines for Comparisons
5.3 Training Settings
5.4 Fashion Set Matching
5.5 Group Re-identification
5.6 Ablation Study
6 Conclusion
References
Making Sense of CNNs: Interpreting Deep Representations and Their Invariances with INNs
1 Introduction
2 Background
3 Approach
3.1 Recovering the Invariances of Deep Models
3.2 Interpreting Representations and Their Invariances
4 Experiments
4.1 Comparison to Existing Methods
4.2 Understanding Models
4.3 Effects of Data Shifts on Models
4.4 Modifying Representations
5 Conclusion
References
Cross-Modal Weighting Network for RGB-D Salient Object Detection
1 Introduction
2 Related Work
3 Proposed Method
3.1 Network Overview and Motivation
3.2 Low- and Middle-Level Cross-Modal Weighting
3.3 High-Level Cross-Modal Weighting
3.4 Implementation Details
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Comparison with State-of-the-Art Methods
4.3 Ablation Studies
5 Conclusion
References
Open-Set Adversarial Defense
1 Introduction
2 Related Work
3 Background
4 Proposed Method
5 Experimental Results
5.1 Datasets
5.2 Baseline Methods
5.3 Quantitative Results
5.4 Ablation Study
5.5 Qualitative Results
6 Conclusion
References
Deep Image Compression Using Decoder Side Information
1 Introduction
2 Related Work
3 Deep Distributed Source Coding for Images
3.1 Architecture
3.2 Using Side Information
4 Experiments
4.1 Implementation Details
4.2 Results
4.3 Ablation Study
5 Conclusions
References
Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation
1 Introduction
2 Related Work
2.1 Synthetic Content Creation
2.2 Graph Generation
3 Methodology
3.1 Representing Synthetic Scenes
3.2 Generative Model
3.3 Training
4 Experiments
4.1 Multi MNIST
4.2 Aerial 2D
4.3 3D Driving Scenes
5 Conclusion
References
A Generic Visualization Approach pgfor Convolutional Neural Networks
1 Introduction
2 Related Work
3 Constrained Attention Filter (CAF)
3.1 Class-Oblivious Variant
3.2 Class-Specific Variant
4 Experiments
4.1 WSOL Using Classification Networks
4.2 WSOL Using Retrieval Networks
4.3 Recurrent Networks' Attention
4.4 Ablation Study
5 Conclusion
References
Interactive Annotation of 3D Object Geometry Using 2D Scribbles
1 Introduction
2 Related Work
3 Interactive 3D Annotation
3.1 Annotation Setup
3.2 Scribble Interaction Module
3.3 Point Interaction Module
4 Experiments
4.1 Experimental Settings
4.2 ShapeNet Annotation
4.3 Annotating Real Scans
4.4 Analysis
4.5 User Study
5 Conclusion
References
Hierarchical Kinematic Human Mesh Recovery
1 Introduction
2 Related Work
3 Approach
3.1 3D Body Representation
3.2 Hierarchical Kinematic Pose and Shape Estimation
3.3 Overall Learning Objective
3.4 In-the-Loop Optimization
4 Experiments and Results
5 Summary
References
Multi-loss Rebalancing Algorithm for Monocular Depth Estimation
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 Loss Function Space
3.2 Loss Rebalancing Algorithm
4 Experimental Results
4.1 Implementation Details
4.2 Datasets and Evaluation Metrics
4.3 Comparison with Conventional Algorithms
4.4 Ablation Studies
4.5 Different Backbone Networks
4.6 Time Complexity
5 Conclusions
References
Author Index