Model-Based Clustering, Classification, and Density Estimation Using mclust in R

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Model-Based Clustering, Classification, and Denisty Estimation Using mclust in R Model-based clustering and classification methods provide a systematic statistical approach to clustering, classification, and density estimation via mixture modeling. The model-based framework allows the problems of choosing or developing an appropriate clustering or classification method to be understood within the context of statistical modeling. The mclust package for the statistical environment R is a widely adopted platform implementing these model-based strategies. The package includes both summary and visual functionality, complementing procedures for estimating and choosing models. Key features of the book An introduction to the model-based approach and the mclust R package A detailed description of mclust and the underlying modeling strategies An extensive set of examples, color plots, and figures along with the R code for reproducing them Supported by a companion website, including the R code to reproduce the examples and figures presented in the book, errata, and other supplementary material Model-Based Clustering, Classification, and Density Estimation Using mclust in R is accessible to quantitatively trained students and researchers with a basic understanding of statistical methods, including inference and computing. In addition to serving as a reference manual for mclust, the book will be particularly useful to those wishing to employ these model-based techniques in research or applications in statistics, data science, clinical research, social science, and many other disciplines.

Author(s): Luca Scrucca, Chris Fraley, T. Brendan Murphy, and Adrian E. Raftery
Publisher: CRC Press
Year: 2023

Language: English
Pages: 269

Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
List of Figures
List of Tables
List of Examples
Preface
1. Introduction
1.1. Model-Based Clustering and Finite Mixture Modeling
1.2. mclust
1.3. Overview
1.4. Organization of the Book
2. Finite Mixture Models
2.1. Finite Mixture Models
2.1.1. Maximum Likelihood Estimation and the EM Algorithm
2.1.2. Issues in Maximum Likelihood Estimation
2.2. Gaussian Mixture Models
2.2.1. Parsimonious Covariance Decomposition
2.2.2. EM Algorithm for Gaussian Mixtures
2.2.3. Initialization of EM Algorithm
2.2.4. Maximum A Posteriori (MAP) Classification
2.3. Model Selection
2.3.1. Information Criteria
2.3.2. Likelihood Ratio Testing
2.4. Resampling-Based Inference
3. Model-Based Clustering
3.1. Gaussian Mixture Models for Cluster Analysis
3.2. Clustering in mclust
3.3. Model Selection
3.3.1. BIC
3.3.2. ICL
3.3.3. Bootstrap Likelihood Ratio Testing
3.4. Resampling-Based Inference in mclust
3.5. Clustering Univariate Data
3.6. Model-Based Agglomerative Hierarchical Clustering
3.6.1. Agglomerative Clustering for Large Datasets
3.7. Initialization in mclust
3.8. EM Algorithm in mclust
3.9. Further Considerations
4. Mixture-Based Classification
4.1. Classification as Supervised Learning
4.2. Gaussian Mixture Models for Classification
4.2.1. Prediction
4.2.2. Estimation
4.3. Classification in mclust
4.4. Evaluating Classifier Performance
4.4.1. Evaluating Predicted Classes: Classification Error
4.4.2. Evaluating Class Probabilities: Brier Score
4.4.3. Estimating Classifier Performance: Test Set and Resampling-Based Validation
4.4.4. Cross-Validation in mclust
4.5. Classification with Unequal Costs of Misclassification
4.6. Classification with Unbalanced Classes
4.7. Classification of Univariate Data
4.8. Semi-Supervised Classification
5. Model-Based Density Estimation
5.1. Density Estimation
5.2. Finite Mixture Modeling for Density Estimation with mclust
5.3. Univariate Density Estimation
5.3.1. Diagnostics for Univariate Density Estimation
5.4. Density Estimation in Higher Dimensions
5.5. Density Estimation for Bounded Data
5.6. Highest Density Regions
6. Visualizing Gaussian Mixture Models
6.1. Displays for Univariate Data
6.2. Displays for Bivariate Data
6.3. Displays for Higher Dimensional Data
6.3.1. Coordinate Projections
6.3.2. Random Projections
6.3.3. Discriminant Coordinate Projections
6.4. Visualizing Model-Based Clustering and Classification on Projection Subspaces
6.4.1. Projection Subspaces for Visualizing Cluster Separation
6.4.2. Incorporating Variation in Covariances
6.4.3. Projection Subspaces for Classification
6.4.4. Relationship to Other Methods
6.5. Using ggplot2 with mclust
6.6. Using Color-Blind-Friendly Palettes
7. Miscellanea
7.1. Accounting for Noise and Outliers
7.2. Using a Prior for Regularization
7.2.1. Adding a Prior in mclust
7.3. Non-Gaussian Clusters from GMMs
7.3.1. Combining Gaussian Mixture Components for Clustering
7.3.2. Identifying Connected Components in GMMs
7.4. Simulation from Mixture Densities
7.5. Large Datasets
7.6. High-Dimensional Data
7.7. Missing Data
Bibliography
Index