Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Each passing year bears witness to the development of ever more powerful computers, increasingly fast and cheap storage media, and even higher bandwidth data connections. This makes it easy to believe that we can now - at least in principle - solve any problem we are faced with so long as we only have enough data. Yet this is not the case. Although large databases allow us to retrieve many different single pieces of  Read more...

Abstract: Each passing year bears witness to the development of ever more powerful computers, increasingly fast and cheap storage media, and even higher bandwidth data connections. This makes it easy to believe that we can now - at least in principle - solve any problem we are faced with so long as we only have enough data. Yet this is not the case. Although large databases allow us to retrieve many different single pieces of information and to compute simple aggregations, general patterns and regularities often go undetected. Furthermore, it is exactly these patterns, regularities and trends that are often most valuable. To avoid the danger of "drowning in information, but starving for knowledge" the branch of research known as data analysis has emerged, and a considerable number of methods and software tools have been developed. However, it is not these tools alone but the intelligent application of human intuition in combination with computational power, of sound background knowledge with computer-aided modeling, and of critical reflection with convenient automatic model construction, that results in successful intelligent data analysis projects. Guide to Intelligent Data Analysis provides a hands-on instructional approach to many basic data analysis techniques, and explains how these are used to solve data analysis problems. Topics and features: Guides the reader through the process of data analysis, following the interdependent steps of project understanding, data understanding, data preparation, modeling, and deployment and monitoring Equips the reader with the necessary information in order to obtain hands-on experience of the topics under discussion Provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms Includes numerous examples using R and KNIME, together with appendices introducing the open source software Integrates illustrations and case-study-style examples to support pedagogical exposition Supplies further tools and information at the associated website: http://www.idaguide.net/ This practical and systematic textbook/reference for graduate and advanced undergraduate students is also essential reading for all professionals who face data analysis problems. Moreover, it is a book to be used following one's exploration of it. Dr. Michael R. Berthold is Nycomed-Professor of Bioinformatics and Information Mining at the University of Konstanz, Germany. Dr. Christian Borgelt is Principal Researcher at the Intelligent Data Analysis and Graphical Models Research Unit of the European Centre for Soft Computing, Spain. Dr. Frank Höppner is Professor of Information Systems at Ostfalia University of Applied Sciences, Germany. Dr. Frank Klawonn is a Professor in the Department of Computer Science and Head of the Data Analysis and Pattern Recognition Laboratory at Ostfalia University of Applied Sciences, Germany. He is also Head of the Bioinformatics and Statistics group at the Helmholtz Centre for Infection Research, Braunschweig, Germany

Author(s): Berthold, Michael R.; Borgelt, Christian; Höppner, Frank et al.
Series: Texts in computer science
Publisher: Springer London : Imprint : Springer
Year: 2010

Language: English
Pages: 398
Tags: Mathematical statistics -- Data processing.;Mathematical statistics.;Artificial intelligence.;Computer Science.;Artificial Intelligence (incl. Robotics).

Content: Preface
Contents
Symbols
Introduction
Motivation
Data and Knowledge
Tycho Brahe and Johannes Kepler
Intelligent Data Analysis
The Data Analysis Process
Methods, Tasks, and Tools
Problem Categories
Catalog of Methods
Available Tools
How to Read This Book
References
Practical Data Analysis: An Example
The Setup
Disclaimer
The Data
The Analysts
Data Understanding and Pattern Finding
The Naive Approach
The Sound Approach
Explanation Finding
The Naive Approach
The Sound Approach
Predicting the Future
The Naive Approach
The Sound Approach
Concluding Remarks. Project UnderstandingDetermine the Project Objective
Assess the Situation
Determine Analysis Goals
Further Reading
References
Data Understanding
Attribute Understanding
Data Quality
Data Visualization
Methods for One and Two Attributes
Methods for Higher-Dimensional Data
Principal Component Analysis
Projection Pursuit
Multidimensional Scaling
Variations of PCA and MDS
Parallel Coordinates
Radar and Star Plots
Correlation Analysis
Outlier Detection
Outlier Detection for Single Attributes
Outlier Detection for Multidimensional Data
Missing Values. A Checklist for Data UnderstandingData Understanding in Practice
Data Understanding in KNIME
Data Loading
Data Types
Visualization
Data Understanding in R
Histograms
Boxplots
Scatter Plots
Principal Component Analysis
Multidimensional Scaling
Parallel Coordinates, Radar, and Star Plots
Correlation Coefficients
Grubb's Test for Outlier Detection
References
Principles of Modeling
Model Classes
Fitting Criteria and Score Functions
Error Functions for Classification Problems
Measures of Interestingness
Algorithms for Model Fitting
Closed Form Solutions
Gradient Method. Combinatorial OptimizationRandom Search, Greedy Strategies, and Other Heuristics
Types of Errors
Experimental Error
Bayes Error
ROC Curves and Confusion Matrices
Sample Error
Model Error
Algorithmic Error
Machine Learning Bias and Variance
Learning Without Bias?
Model Validation
Training and Test Data
Cross-Validation
Bootstrapping
Measures for Model Complexity
The Minimum Description Length Principle
Akaike's and the Bayesian Information Criterion
Model Errors and Validation in Practice
Errors and Validation in KNIME
Validation in R
Further Reading
References. Data PreparationSelect Data
Feature Selection
Selecting the k Top-Ranked Features
Selecting the Top-Ranked Subset
Dimensionality Reduction
Record Selection
Clean Data
Improve Data Quality
Missing Values
Ignorance/Deletion
Imputation
Explicit Value or Variable
Construct Data
Provide Operability
Scale Conversion
Dynamic Domains
Problem Reformulation
Assure Impartiality
Maximize Efficiency
Complex Data Types
Text Data Analysis
Graph Data Analysis
Image Data Analysis
Other Data Types
Data Integration
Vertical Data Integration
Horizontal Data Integration.