This book enriches unsupervised outlier detection research by proposing several new distance-based and density-based outlier scores in a k-nearest neighbors’ setting. The respective chapters highlight the latest developments in k-nearest neighbor-based outlier detection research and cover such topics as our present understanding of unsupervised outlier detection in general; distance-based and density-based outlier detection in particular; and the applications of the latest findings to boundary point detection and novel object detection. The book also offers a new perspective on bridging the gap between k-nearest neighbor-based outlier detection and clustering-based outlier detection, laying the groundwork for future advances in unsupervised outlier detection research.
The authors hope the algorithms and applications proposed here will serve as valuable resources for outlier detection researchers for years to come.
Author(s): Xiaochun Wang, Xiali Wang, Mitch Wilkes
Publisher: Springer Singapore
Year: 2021
Language: English
Pages: 277
City: Singapore
Foreword
Preface
Acknowledgements
Contents
About the Authors
Abbreviations
Part IIntroduction
1 Overview and Contributions
1.1 Introduction
1.2 Research Issues on Unsupervised Outlier Detection
1.3 Overview of the Book
1.4 Contributions
1.5 Conclusions
2 Developments in Unsupervised Outlier Detection Research
2.1 Introduction
2.1.1 A Brief Overview of the Early Developments in Outlier Analysis
2.2 Some Standard Unsupervised Outlier Detection Approaches
2.2.1 Probabilistic Model-Based Outlier Detection Approach
2.2.2 Clustering-Based Outlier Detection Approaches
2.2.3 Distance-Based Outlier Detection Approaches
2.2.4 Density-Based Outlier Detection Approaches
2.2.5 Outlier Detection for Time Series
2.3 Performance Evaluation Metrics of Outlier Detection Approaches
2.3.1 Precision, Recall and Rank Power
2.4 Conclusions
References
Part IINew Developments in Unsupervised Outlier Detection Research
3 A Fast Distance-Based Outlier Detection Technique Using a Divisive Hierarchical Clustering Algorithm
3.1 Introduction
3.2 Related Work
3.2.1 Distance-Based Outlier Detection Research
3.2.2 A Divisive Hierarchical Clustering Algorithm for Approximate kNN Search
3.2.3 An Efficiency Analysis of DHCA for Distance-Based Outlier Detection
3.3 The Proposed Fast Distance-Based Outlier Detection Algorithm
3.3.1 A Simple Idea
3.3.2 The Proposed CPU-Efficient DB-Outlier Detection Method
3.3.3 Time Complexity Analysis
3.3.4 Data Structure for Implementing DHCA
3.4 Scale to Very Large Databases with I/O Efficiency
3.5 Performance Evaluation
3.5.1 Data Characteristics
3.5.2 The Impact of Input K on Running Time
3.5.3 Comparison with Other Methods
3.5.4 Effectiveness of DHCA for kNN Search
3.5.5 The Impact of Curse of Dimensionality
3.5.6 Scale to Very Large Databases with I/O Efficiency
3.5.7 Discussion
3.6 Conclusions
References
4 A k-Nearest Neighbor Centroid-Based Outlier Detection Method
4.1 Introduction
4.2 K-means Clustering and Its Application to Outlier Detection
4.2.1 K-means Clustering
4.2.2 K-means Clustering-Based Outlier Detection
4.3 A kNN-Centroid-Based Outlier Detection Algorithm
4.3.1 General Idea
4.3.2 Definition for an Outlier Indicator
4.3.3 Formal Definition of kNN-Based Centroid
4.3.4 Two New Formulations of Outlier Factors
4.3.5 Determination of k
4.3.6 The Complexity Analysis
4.3.7 The Proposed Outlier Detection Algorithm
4.4 A Performance Study
4.4.1 Performance on Synthetic Datasets
4.4.2 Performance on Real Datasets
4.4.3 Performance on High-Dimensional Real Datasets
4.4.4 Discussion
4.5 Conclusions
References
5 A Minimum Spanning Tree Clustering-Inspired Outlier Detection Technique
5.1 Introduction
5.2 Background
5.2.1 Minimum Spanning Tree-Based Clustering
5.2.2 Minimum Spanning Tree Clustering-Based Outlier Detection
5.3 An Improved MST-Clustering-Inspired Outlier Detection Algorithm
5.3.1 A Simple Idea
5.3.2 Two New Outlier Factors
5.3.3 The Proposed MST-Clustering-Inspired Outlier Detection Algorithm
5.3.4 Time Complexity Analysis
5.4 A Performance Study
5.4.1 Performance on Synthetic Datasets
5.4.2 Performance on Multi-dimensional Real Datasets
5.4.3 Performance of the Proposed Algorithm with Varying SOM-TH
5.5 Concluding Remarks
References
6 A k-Nearest Neighbour Spectral Clustering-Based Outlier Detection Technique
6.1 Introduction
6.2 Spectral Clustering and Its Application to Outlier Detection
6.2.1 Preliminaries
6.2.2 Spectral Clustering-Based Outlier Detection
6.3 The Proposed Spectral Clustering-Based Outlier Mining Algorithm
6.3.1 A Simple Idea
6.3.2 The Proposed Outlier Detection Algorithm
6.3.3 Complexity Analysis
6.4 Experimental Results
6.4.1 Performance of Our Algorithm on Synthetic Data
6.4.2 Performance of Our Algorithm on Real Data
6.5 Discussion
6.6 Conclusions
References
7 Enhancing Outlier Detection by Filtering Out Core Points and Border Points
7.1 Introduction
7.2 Related Work
7.2.1 Density-Based Clustering with DBSCAN
7.2.2 Density-Based Clustering for Outlier Detection
7.3 The Proposed Enhancer for Outlier Mining
7.3.1 A Simple Idea
7.3.2 Some Definitions
7.3.3 Our Proposed Outlier Detection Algorithm
7.3.4 The Complexity Analysis
7.4 Experiments and Results
7.4.1 Performance of Our Algorithm on Synthetic Data
7.4.2 Performance of Our Algorithm on Real Data
7.5 Conclusions
References
Part IIIApplications
8 An Effective Boundary Point Detection Algorithm Via k-Nearest Neighbors-Based Centroid
8.1 Introduction
8.2 Related Work
8.2.1 Outlier and Boundary Point Detection
8.2.2 EMST Algorithms
8.3 Boundary Point Detection Based on kNN Centroid
8.3.1 Definitions
8.3.2 The Proposed Boundary Point and Outlier Detection Algorithm
8.3.3 The Complexity Analysis
8.4 The Proposed Fast Approximate EMST Algorithm
8.4.1 Our Clustering-Inspired EMST Algorithm
8.4.2 Time Complexity Analysis
8.5 Experiments and Results
8.5.1 Performance Evaluation of the Proposed Boundary Point Detection Algorithm
8.5.2 Performance Evaluation of the Fast Approximate EMST Algorithm
8.6 Conclusions
References
9 A Nearest Neighbor Classifier-Based Automated On-Line Novel Visual Percept Detection Method
9.1 Introduction
9.2 A Percept Learning System
9.2.1 Feature Generation
9.2.2 Similarity Measure
9.2.3 Percept Formation
9.2.4 A Fast Approximate Nearest Neighbor Classifier
9.3 An On-Line Novelty Detection Method
9.3.1 A Threshold Selection Method
9.3.2 Eight-Connected Structure Element Filter
9.3.3 Tree Update Method
9.4 Experiments and Results
9.4.1 Experiment I: An Indoor Environment
9.4.2 Experiment II: An Outdoor Environment
9.5 Conclusions
References
10 Unsupervised Fraud Detection in Environmental Time Series Data
10.1 Introduction
10.2 Related Work
10.2.1 Point Outliers
10.2.2 Shape Outliers
10.3 Method
10.3.1 A Simple Idea
10.3.2 Selecting an Appropriate Threshold for Fraud Detection
10.3.3 The Complexity Analysis
10.4 Experiments and Results
10.4.1 Fraud Detection on Wastewater Discharge Concentration Data
10.4.2 Fraud Detection on Gas Emission Concentration Data
10.5 Conclusions
References