This book explains the principal techniques of data mining, for classification, association rule mining and clustering. Each topic is clearly explained and illustrated by detailed examples, with a focus on algorithms rather than mathematical formalism.
Author(s): Max Bramer
Series: Undergraduate Topics in Computer Science
Edition: 3rd ed. 2016
Publisher: Springer
Year: 2016
Language: English
Pages: 544
Principles of Data Mining
About This Book
Contents
1. Introduction to Data Mining
1.1 The Data Explosion
1.2 Knowledge Discovery
1.3 Applications of Data Mining
1.4 Labelled and Unlabelled Data
1.5 Supervised Learning: Classification
1.6 Supervised Learning: Numerical Prediction
1.7 Unsupervised Learning: Association Rules
1.8 Unsupervised Learning: Clustering
2. Data for Data Mining
2.1 Standard Formulation
2.2 Types of Variable
2.2.1 Categorical and Continuous Attributes
2.3 Data Preparation
2.3.1 Data Cleaning
2.4 Missing Values
2.4.1 Discard Instances
2.4.2 Replace by Most Frequent/Average Value
2.5 Reducing the Number of Attributes
2.6 The UCI Repository of Datasets
2.7 Chapter Summary
2.8 Self-assessment Exercises for Chapter 2
Reference
3. Introduction to Classification: Naïve Bayes and Nearest Neighbour
3.1 What Is Classification?
3.2 Naïve Bayes Classifiers
3.3 Nearest Neighbour Classification
3.3.1 Distance Measures
3.3.2 Normalisation
3.3.3 Dealing with Categorical Attributes
3.4 Eager and Lazy Learning
3.5 Chapter Summary
3.6 Self-assessment Exercises for Chapter 3
4. Using Decision Trees for Classification
4.1 Decision Rules and Decision Trees
4.1.1 Decision Trees: The Golf Example
4.1.2 Terminology
4.1.3 The degrees Dataset
4.2 The TDIDT Algorithm
4.3 Types of Reasoning
4.4 Chapter Summary
4.5 Self-assessment Exercises for Chapter 4
References
5. Decision Tree Induction: Using Entropy for Attribute Selection
5.1 Attribute Selection: An Experiment
5.2 Alternative Decision Trees
5.2.1 The Football/Netball Example
5.2.2 The anonymous Dataset
5.3 Choosing Attributes to Split On: Using Entropy
5.3.1 The lens24 Dataset
5.3.2 Entropy
5.3.3 Using Entropy for Attribute Selection
5.3.4 Maximising Information Gain
5.4 Chapter Summary
5.5 Self-assessment Exercises for Chapter 5
6. Decision Tree Induction: Using Frequency Tables for Attribute Selection
6.1 Calculating Entropy in Practice
6.1.1 Proof of Equivalence
6.1.2 A Note on Zeros
6.2 Other Attribute Selection Criteria: Gini Index of Diversity
6.3 The chi2 Attribute Selection Criterion
6.4 Inductive Bias
6.5 Using Gain Ratio for Attribute Selection
6.5.1 Properties of Split Information
6.5.2 Summary
6.6 Number of Rules Generated by Different Attribute Selection Criteria
6.7 Missing Branches
6.8 Chapter Summary
6.9 Self-assessment Exercises for Chapter 6
References
7. Estimating the Predictive Accuracy of a Classifier
7.1 Introduction
7.2 Method 1: Separate Training and Test Sets
7.2.1 Standard Error
7.2.2 Repeated Train and Test
7.3 Method 2: k-fold Cross-validation
7.4 Method 3: N-fold Cross-validation
7.5 Experimental Results I
7.6 Experimental Results II: Datasets with Missing Values
7.6.1 Strategy 1: Discard Instances
7.6.2 Strategy 2: Replace by Most Frequent/Average Value
7.6.3 Missing Classifications
7.7 Confusion Matrix
7.7.1 True and False Positives
7.8 Chapter Summary
7.9 Self-assessment Exercises for Chapter 7
Reference
8. Continuous Attributes
8.1 Introduction
8.2 Local versus Global Discretisation
8.3 Adding Local Discretisation to TDIDT
8.3.1 Calculating the Information Gain of a Set of Pseudo-attributes
8.3.2 Computational Efficiency
8.4 Using the ChiMerge Algorithm for Global Discretisation
8.4.1 Calculating the Expected Values and chi2
8.4.2 Finding the Threshold Value
8.4.3 Setting minIntervals and maxIntervals
8.4.4 The ChiMerge Algorithm: Summary
8.4.5 The ChiMerge Algorithm: Comments
8.5 Comparing Global and Local Discretisation for Tree Induction
8.6 Chapter Summary
8.7 Self-assessment Exercises for Chapter 8
Reference
9. Avoiding Overfitting of Decision Trees
9.1 Dealing with Clashes in a Training Set
9.1.1 Adapting TDIDT to Deal with Clashes
9.2 More About Overfitting Rules to Data
9.3 Pre-pruning Decision Trees
9.4 Post-pruning Decision Trees
9.5 Chapter Summary
9.6 Self-assessment Exercise for Chapter 9
References
10. More About Entropy
10.1 Introduction
10.2 Coding Information Using Bits
10.3 Discriminating Amongst M Values (M Not a Power of 2)
10.4 Encoding Values That Are Not Equally Likely
10.5 Entropy of a Training Set
10.6 Information Gain Must Be Positive or Zero
10.7 Using Information Gain for Feature Reduction for Classification Tasks
10.7.1 Example 1: The genetics Dataset
10.7.2 Example 2: The bcst96 Dataset
10.8 Chapter Summary
10.9 Self-assessment Exercises for Chapter 10
References
11. Inducing Modular Rules for Classification
11.1 Rule Post-pruning
11.2 Conflict Resolution
11.3 Problems with Decision Trees
11.4 The Prism Algorithm
11.4.1 Changes to the Basic Prism Algorithm
11.4.2 Comparing Prism with TDIDT
11.5 Chapter Summary
11.6 Self-assessment Exercise for Chapter 11
References
12. Measuring the Performance of a Classifier
12.1 True and False Positives and Negatives
12.2 Performance Measures
12.3 True and False Positive Rates versus Predictive Accuracy
12.4 ROC Graphs
12.5 ROC Curves
12.6 Finding the Best Classifier
12.7 Chapter Summary
12.8 Self-assessment Exercise for Chapter 12
13. Dealing with Large Volumes of Data
13.1 Introduction
13.2 Distributing Data onto Multiple Processors
13.3 Case Study: PMCRI
13.4 Evaluating the Effectiveness of a Distributed System: PMCRI
13.5 Revising a Classifier Incrementally
13.6 Chapter Summary
13.7 Self-assessment Exercises for Chapter 13
References
14. Ensemble Classification
14.1 Introduction
14.2 Estimating the Performance of a Classifier
14.3 Selecting a Different Training Set for Each Classifier
14.4 Selecting a Different Set of Attributes for Each Classifier
14.5 Combining Classifications: Alternative Voting Systems
14.6 Parallel Ensemble Classifiers
14.7 Chapter Summary
14.8 Self-assessment Exercises for Chapter 14
References
15. Comparing Classifiers
15.1 Introduction
15.2 The Paired t-Test
15.3 Choosing Datasets for Comparative Evaluation
15.3.1 Confidence Intervals
15.4 Sampling
15.5 How Bad Is a `No Significant Difference' Result?
15.6 Chapter Summary
15.7 Self-assessment Exercises for Chapter 15
References
16. Association Rule Mining I
16.1 Introduction
16.2 Measures of Rule Interestingness
16.2.1 The Piatetsky-Shapiro Criteria and the RI Measure
16.2.2 Rule Interestingness Measures Applied to the chess Dataset
16.2.3 Using Rule Interestingness Measures for Conflict Resolution
16.3 Association Rule Mining Tasks
16.4 Finding the Best N Rules
16.4.1 The J-Measure: Measuring the Information Content of a Rule
16.4.2 Search Strategy
16.5 Chapter Summary
16.6 Self-assessment Exercises for Chapter 16
References
17. Association Rule Mining II
17.1 Introduction
17.2 Transactions and Itemsets
17.3 Support for an Itemset
17.4 Association Rules
17.5 Generating Association Rules
17.6 Apriori
17.7 Generating Supported Itemsets: An Example
17.8 Generating Rules for a Supported Itemset
17.9 Rule Interestingness Measures: Lift and Leverage
17.10 Chapter Summary
17.11 Self-assessment Exercises for Chapter 17
Reference
18. Association Rule Mining III: Frequent Pattern Trees
18.1 Introduction: FP-Growth
18.2 Constructing the FP-tree
18.2.1 Pre-processing the Transaction Database
18.2.2 Initialisation
18.2.3 Processing Transaction 1: f, c, a, m, p
18.2.4 Processing Transaction 2: f, c, a, b, m
18.2.5 Processing Transaction 3: f, b
18.2.6 Processing Transaction 4: c, b, p
18.2.7 Processing Transaction 5: f, c, a, m, p
18.3 Finding the Frequent Itemsets from the FP-tree
18.3.1 Itemsets Ending with Item p
18.3.2 Itemsets Ending with Item m
18.4 Chapter Summary
18.5 Self-assessment Exercises for Chapter 18
Reference
19. Clustering
19.1 Introduction
19.2 k-Means Clustering
19.2.1 Example
19.2.2 Finding the Best Set of Clusters
19.3 Agglomerative Hierarchical Clustering
19.3.1 Recording the Distance Between Clusters
19.3.2 Terminating the Clustering Process
19.4 Chapter Summary
19.5 Self-assessment Exercises for Chapter 19
20. Text Mining
20.1 Multiple Classifications
20.2 Representing Text Documents for Data Mining
20.3 Stop Words and Stemming
20.4 Using Information Gain for Feature Reduction
20.5 Representing Text Documents: Constructing a Vector Space Model
20.6 Normalising the Weights
20.7 Measuring the Distance Between Two Vectors
20.8 Measuring the Performance of a Text Classifier
20.9 Hypertext Categorisation
20.9.1 Classifying Web Pages
20.9.2 Hypertext Classification versus Text Classification
20.10 Chapter Summary
20.11 Self-assessment Exercises for Chapter 20
21. Classifying Streaming Data
21.1 Introduction
21.1.1 Stationary v Time-dependent Data
21.2 Building an H-Tree: Updating Arrays
21.2.1 Array currentAtts
21.2.2 Array splitAtt
21.2.3 Sorting a record to the appropriate leaf node
21.2.4 Array hitcount
21.2.5 Array classtotals
21.2.6 Array acvCounts
21.2.7 Array branch
21.3 Building an H-Tree: a Detailed Example
21.3.1 Step (a): Initialise Root Node 0
21.3.2 Step (b): Begin Reading Records
21.3.3 Step (c): Consider Splitting at Node 0
21.3.4 Step (d): Split on Root Node and Initialise New Leaf Nodes
21.3.5 Step (e): Process the Next Set of Records
21.3.6 Step (f): Consider Splitting at Node 2
21.3.7 Step (g): Process the Next Set of Records
21.3.8 Outline of the H-Tree Algorithm
21.4 Splitting on an Attribute: Using Information Gain
21.5 Splitting on An Attribute: Using a Hoeffding Bound
21.6 H-Tree Algorithm: Final Version
21.7 Using an Evolving H-Tree to Make Predictions
21.7.1 Evaluating the Performance of an H-Tree
21.8 Experiments: H-Tree versus TDIDT
21.8.1 The lens24 Dataset
21.8.2 The vote Dataset
21.9 Chapter Summary
21.10 Self-assessment Exercises for Chapter 21
References
22. Classifying Streaming Data II: Time-Dependent Data
22.1 Stationary versus Time-dependent Data
22.2 Summary of the H-Tree Algorithm
22.2.1 Array currentAtts
22.2.2 Array splitAtt
22.2.3 Array hitcount
22.2.4 Array classtotals
22.2.5 Array acvCounts
22.2.6 Array branch
22.2.7 Pseudocode for the H-Tree Algorithm
22.3 From H-Tree to CDH-Tree: Overview
22.4 From H-Tree to CDH-Tree: Incrementing Counts
22.5 The Sliding Window Method
22.6 Resplitting at Nodes
22.7 Identifying Suspect Nodes
22.8 Creating Alternate Nodes
22.9 Growing/Forgetting an Alternate Node and its Descendants
22.10 Replacing an Internal Node by One of its Alternate Nodes
22.11 Experiment: Tracking Concept Drift
22.11.1 lens24 Data: Alternative Mode
22.11.2 Introducing Concept Drift
22.11.3 An Experiment with Alternating lens24 Data
22.11.4 Comments on Experiment
22.12 Chapter Summary
22.13 Self-assessment Exercises for Chapter 22
Reference
A. Essential Mathematics
A.1 Subscript Notation
A.1.1 Sigma Notation for Summation
A.1.2 Double Subscript Notation
A.1.3 Other Uses of Subscripts
A.2 Trees
A.2.1 Terminology
A.2.2 Interpretation
A.2.3 Subtrees
A.3 The Logarithm Function log2 X
A.3.1 The Function -X log2 X
A.4 Introduction to Set Theory
A.4.1 Subsets
A.4.2 Summary of Set Notation
B. Datasets
References
C. Sources of Further Information
Websites
Books
Books on Neural Nets
Conferences
Information About Association Rule Mining
D. Glossary and Notation
E. Solutions to Self-assessment Exercises
Index