Core Concepts in Data Analysis: Summarization, Correlation and Visualization

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Core Concepts in Data Analysis: Summarization, Correlation and Visualization provides in-depth descriptions of those data analysis approaches that either summarize data (principal component analysis and clustering, including hierarchical and network clustering) or correlate different aspects of data (decision trees, linear rules, neuron networks, and Bayes rule). Boris Mirkin takes an unconventional approach and  Read more...

Abstract: Core Concepts in Data Analysis: Summarization, Correlation and Visualization provides in-depth descriptions of those data analysis approaches that either summarize data (principal component analysis and clustering, including hierarchical and network clustering) or correlate different aspects of data (decision trees, linear rules, neuron networks, and Bayes rule). Boris Mirkin takes an unconventional approach and introduces the concept of multivariate data summarization as a counterpart to conventional machine learning prediction schemes, utilizing techniques from statistics, data analysis, data mining, machine learning, computational intelligence, and information retrieval. Innovations following from his in-depth analysis of the models underlying summarization techniques are introduced, and applied to challenging issues such as the number of clusters, mixed scale data standardization, interpretation of the solutions, as well as relations between seemingly unrelated concepts: goodness-of-fit functions for classification trees and data standardization, spectral clustering and additive clustering, correlation and visualization of contingency data. The mathematical detail is encapsulated in the so-called "formulation" parts, whereas most material is delivered through "presentation" parts that explain the methods by applying them to small real-world data sets; concise "computation" parts inform of the algorithmic and coding issues. Four layers of active learning and self-study exercises are provided: worked examples, case studies, projects and questions

Author(s): Mirkin, Boris
Series: Undergraduate topics in computer science
Publisher: Springer London : Imprint : Springer
Year: 2011

Language: English
Pages: 401
Tags: Data structures (Computer science) -- Mathematical models.;Data structures (Computer science) -- Statistical methods.;Data structures (Computer science)

Content: ""Preface""
""Acknowledgments""
""Contents""
""List of Projects""
""List of Case Study""
""List of Worked Example""
""1 Introduction: What Is Core""
""1.1 Summarization and Correlation: Two Main Goals of Data Analysis""
""1.2 Case Study Problems""
""Case 1.2.1: Company""
""Case 1.2.2: Iris""
""Case 1.2.3: Market Towns""
""Case 1.2.4: Student""
""Case 1.2.5: Intrusion""
""Case 1.2.6: Confusion""
""Case 1.2.7: Amino Acid Substitution Rates""
""1.3 An Account of Data Visualization""
""1.3.1 General""
""1.3.2 Highlighting""
""1.3.3 Integrating Different Aspects"" ""1.3.4 Narrating a Story""""1.4 Summary""
""References""
""2 1D Analysis: Summarization and Visualization of a Single Feature""
""2.1 Quantitative Feature: Distribution and Histogram""
""P2.1.1 Presentation""
""F2.1.2 Formulation""
""C2.1.3 Computation""
""2.2 Further Summarization: Centers and Spreads""
""P2.2.1 Centers and Spreads: Presentation""
""Worked example 2.1. Mean""
""Worked example 2.2. Median""
""Worked example 2.3. P-quantile (percentile)""
""Worked example 2.4. Mode""
""F2.2.2 Centers and Spreads: Formulation""
""F2.2.2.1 Data Analysis Perspective"" ""F2.2.2.2 Probabilistic Statistics Perspective""""C2.2.3 Centers and Spreads: Computation""
""2.3 Binary and Categorical Features""
""P2.3.1 Presentation""
""Worked example 2.5. Entropy and Gini index of a distribution""
""F2.3.2 Formulation""
""C2.3.3 Computation""
""2.4 Modeling Uncertainty: Intervals and Fuzzy Sets""
""2.4.1 Individual Membership Functions""
""2.4.2 Central Fuzzy Set""
""Project 2.1. Computing Minkowski metric's center""
""Project 2.2. Analysis of a multimodal distribution""
""Project 2.3. Computational validation of the mean by bootstrapping"" ""Project 2.4. K-fold cross validation""""2.5 Summary""
""References""
""3 2D Analysis: Correlation and Visualization of Two Features""
""3.1 General""
""3.2 Two Quantitative Features Case""
""P3.2.1 Scatter-Plot, Linear Regression and Correlation Coefficients""
""P3.2.2 Validity of the Regression""
""Worked example 3.1. Determination coefficient""
""Worked example 3.2. Bootstrap validity testing""
""Worked example 3.3. Prediction error of the regression equation""
""F3.2.3 Linear Regression: Formulation""
""F3.2.3.1 Fitting Linear Regression"" ""F3.2.3.2 Correlation Coefficient and Its Properties""""F3.2.3.3 Linearization of Non-linear Regression""
""C3.2.4 Linear Regression: Computation""
""Project 3.1. 2D analysis, linear regression and bootstrapping""
""Project 3.2. Non-linear and linearized regression: a nature-inspired algorithm""
""Case-study 3.1. Growth of Investment""
""Case-study 3.2. Correlation Between Iris Sepal Length and Width""
""3.3 Mixed Scale Case: Nominal Feature Versus a Quantitative One""
""P3.3.1 Box-Plot, Tabular Regression and Correlation Ratio""