The chapters in the book are organized for a typical one-semester course. The book contains case-lets from real-world stories at the beginning of every chapter. There is also a running case study across the chapters as exercises. This book is designed to provide a student with the intuition behind this evolving area, along with a solid toolset of the major data mining techniques and platforms. Finally, it includes a tutorial for R. The 2019 edition contained expanded primers on Big Data, Artificial Intelligence, and Data Science careers, and a full tutorial on Python. The 2020 edition contains a new chapter on Data Ownership and Privacy, as these issues have become increasingly important.
Author(s): Anil Maheshwari
Year: 2020
Language: English
Commentary: Introduction to analytics, adapted as a textbook for graduate courses in Business Intelligence and Data Mining
Pages: 314
Preface to 2020 edition
Chapter 1: Wholeness of Data Analytics
Introduction
Business Intelligence
Caselet: MoneyBall - Data Mining in Sports
Pattern Recognition
Types of Patterns
Finding a Pattern
Uses of Patterns
Data Processing Chain
Data
Database
Data Warehouse
Data Mining
Data Visualization
Terminology and Careers
Organization of the book
Review Questions
Section 1
Chapter 2: Business Intelligence Concepts and Applications
Introduction
Caselet: Khan Academy – BI in Education
BI for better decisions
Decision types
BI Tools
BI Skills
BI Applications
Customer Relationship Management
Healthcare and Wellness
Education
Retail
Banking
Financial Services
Insurance
Manufacturing
Telecom
Public Sector
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 1
Chapter 3: Data Warehousing
Introduction
Caselet: University Health System – BI in Healthcare
Design Considerations for DW
DW Development Approaches
DW Architecture
Data Sources
Data Loading Processes
Data Warehouse Design
DW Access
DW Best Practices
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 2
Chapter 4: Data Mining
Introduction
Caselet: Target Corp – Data Mining in Retail
Gathering and selecting data
Data cleansing and preparation
Outputs of Data Mining
Evaluating Data Mining Results
Data Mining Techniques
Tools and Platforms for Data Mining
Data Mining Best Practices
Myths about data mining
Data Mining Mistakes
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 3
Chapter 5: Data Visualization
Introduction
Caselet: Dr Hans Gosling - Visualizing Global Public Health
Excellence in Visualization
Types of Charts
Visualization Example
Visualization Example phase -2
Tips for Data Visualization
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 4
Section 2 – Popular Data Mining Techniques
Chapter 6: Decision Trees
Introduction
Caselet: Predicting Heart Attacks using Decision Trees
Decision Tree problem
Decision Tree Construction
Lessons from constructing trees
Decision Tree Algorithms
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 5
Chapter 7: Regression
Introduction
Caselet: Data driven Prediction Markets
Correlations and Relationships
Visual look at relationships
Regression Exercise
Non-linear regression exercise
Logistic Regression
Advantages and Disadvantages of Regression Models
Conclusion
Review Exercises:
Liberty Stores Case Exercise: Step 6
Chapter 8: Artificial Neural Networks
Introduction
Caselet: IBM Watson - Analytics in Medicine
Business Applications of ANN
Design Principles of an Artificial Neural Network
Representation of a Neural Network
Architecting a Neural Network
Developing an ANN
Advantages and Disadvantages of using ANNs
Conclusion
Review Exercises
Chapter 9: Cluster Analysis
Introduction
Caselet: Cluster Analysis
Applications of Cluster Analysis
Definition of a Cluster
Representing clusters
Clustering techniques
Clustering Exercise
K-Means Algorithm for clustering
Selecting the number of clusters
Advantages and Disadvantages of K-Means algorithm
Conclusion
Review Exercises
Liberty Stores Case Exercise: Step 7
Chapter 10: Association Rule Mining
Introduction
Caselet: Netflix: Data Mining in Entertainment
Business Applications of Association Rules
Representing Association Rules
Algorithms for Association Rule
Apriori Algorithm
Association rules exercise
Creating Association Rules
Conclusion
Review Exercises
Liberty Stores Case Exercise: Step 8
Section 3 – Advanced Mining
Chapter 11: Text Mining
Introduction
Caselet: WhatsApp and Private Security
Text Mining Applications
Text Mining Process
Term Document Matrix
Mining the TDM
Comparing Text Mining and Data Mining
Text Mining Best Practices
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 9
Chapter 12: Naïve Bayes Analysis
Introduction
CASELET: Fraud detection in government contracts
Probability
Naïve-Bayes model
Simple classification example
Text Classification Example
Advantages and Disadvantages of Naïve Bayes
Summary
Review Questions
Chapter 13: Support Vector Machines
Introduction
SVM model
The Kernel Method
Advantages and disadvantages
Summary
Review Questions
Chapter 14: Web Mining
Introduction
Web content mining
Web structure mining
Web usage mining
Web Mining Algorithms
Conclusion
Review Questions
Chapter 15: Social Network Analysis
Introduction
Caselet: The Social Life of Books
Applications of SNA
Network topologies
Techniques and algorithms
Finding Sub-networks
Computing importance of nodes
PageRank
Practical considerations
Comparing SNA with Data Analytics
Conclusion
Review Questions
Section 4 - Primers
Chapter 16: Big Data Primer
Introduction
Understanding Big Data
CASELET: IBM Watson: A Big Data system
Capturing Big Data
Volume of Data
Velocity of Data
Variety of Data
Veracity of Data
Benefitting from Big Data
Management of Big Data
Organizing Big Data
Analyzing Big Data
Technology Challenges for Big Data
Storing Huge Volumes
Ingesting streams at an extremely fast pace
Handling a variety of forms and functions of data
Processing data at huge speeds
Conclusion and Summary
Review Questions
Liberty Stores Case Exercise: Step P1
Chapter 17: Data Modeling Primer
Introduction
Evolution of data management systems
Relational Data Model
Implementing the Relational Data Model
Database management systems (DBMS)
Structured Query Language
Conclusion
Review Questions
Chapter 18: Statistics Primer
Introduction
Descriptive Statistics
Example data set
Computing Mean, Median, Mode
Computing the range and variance
Histograms
Normal Distribution and Bell Curve
Inferential Statistics
Random sampling
Confidence Interval
Predictive Statistics
Summary
Review Questions
Chapter 19 - Artificial Intelligence Primer
CASELET: Apple Siri Voice-activated personal assistant
AI, Machine Learning, and Deep Learning
The Industrial Revolution
The Information Revolution
The Cognitive (or AI) revolution
Jobs Losses and Gains
AI and Existential Threat
Conclusion
Review Questions
Chapter 20: Data Ownership and Privacy
Data Ownership
Data Privacy
Data Privacy Models
Chinese Model
US Model
European Model
Summary
Chapter 21: Data Science Careers
Data Scientist
Data Engineer
Data Science aptitude
Popular Skills
Appendix: R Tutorial for Data Mining
Getting Started with R
Installing R
Working on R
Import a Dataset in R
Data visualization
Plotting a Histogram
Ploting a Bar Chart
Ploting charts side by side
Data Mining Techniques
Decision Tree
Correlation
Regression
Clustering – Kmeans (Unsupervised Learning)
Big Data Mining
WordCloud
Twitter Mining
Steps on Twitter side
R Script
Page Rank
Additional Documentation
Appendix: Python Tutorial for Data Mining
1About this Tutorial
2Getting Started
3Installation
4Working on Python
4.1Windows 7
4.2Windows 10
4.3Python Help and Tutorial
4.4Import a Dataset in Python
4.5Data visualization –
4.5.1Ploting a Histogram
4.5.2Plotting a Bar Chart
4.5.3Ploting charts side by side
5Data Mining Techniques
5.1Decision Tree (Supervised Learning)
5.2Regression (Supervised Learning)
5.3Correlation (Supervised Learning)
5.4Clustering – Kmeans (Unsupervised Learning)
6Big Data Mining
6.1WordCloud - directory FWordCloud and look at code module WordCloud.py.
6.2Twitter Mining
6.2.1Steps (Twitter side)
6.2.2Python code
6.3Page Rank
7Additional Documentation
Additional Resources
About the Author