Data Analytics Made Accessible

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The chapters in the book are organized for a typical one-semester course. The book contains case-lets from real-world stories at the beginning of every chapter. There is also a running case study across the chapters as exercises. This book is designed to provide a student with the intuition behind this evolving area, along with a solid toolset of the major data mining techniques and platforms. Finally, it includes a tutorial for R. The 2019 edition contained expanded primers on Big Data, Artificial Intelligence, and Data Science careers, and a full tutorial on Python. The 2020 edition contains a new chapter on Data Ownership and Privacy, as these issues have become increasingly important.

Author(s): Anil Maheshwari
Year: 2020

Language: English
Commentary: Introduction to analytics, adapted as a textbook for graduate courses in Business Intelligence and Data Mining
Pages: 314

Preface to 2020 edition
Chapter 1: Wholeness of Data Analytics
Introduction
Business Intelligence
Caselet: MoneyBall - Data Mining in Sports
Pattern Recognition
Types of Patterns
Finding a Pattern
Uses of Patterns
Data Processing Chain
Data
Database
Data Warehouse
Data Mining
Data Visualization
Terminology and Careers
Organization of the book
Review Questions
Section 1
Chapter 2: Business Intelligence Concepts and Applications
Introduction
Caselet: Khan Academy – BI in Education
BI for better decisions
Decision types
BI Tools
BI Skills
BI Applications
Customer Relationship Management
Healthcare and Wellness
Education
Retail
Banking
Financial ServicesInsurance
Manufacturing
Telecom
Public Sector
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 1
Chapter 3: Data Warehousing
Introduction
Caselet: University Health System – BI in Healthcare
Design Considerations for DW
DW Development Approaches
DW Architecture
Data Sources
Data Loading Processes
Data Warehouse Design
DW Access
DW Best Practices
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 2
Chapter 4: Data Mining
Introduction
Caselet: Target Corp – Data Mining in Retail
Gathering and selecting data
Data cleansing and preparation
Outputs of Data Mining
Evaluating Data Mining Results
Data Mining Techniques
Tools and Platforms for Data Mining
Data Mining Best Practices
Myths about data mining
Data Mining Mistakes
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 3
Chapter 5: Data Visualization
Introduction
Caselet: Dr Hans Gosling - Visualizing Global Public HealthExcellence in Visualization
Types of Charts
Visualization Example
Visualization Example phase -2
Tips for Data Visualization
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 4
Section 2 – Popular Data Mining Techniques
Chapter 6: Decision Trees
Introduction
Caselet: Predicting Heart Attacks using Decision Trees
Decision Tree problem
Decision Tree Construction
Lessons from constructing trees
Decision Tree Algorithms
Conclusion
Review Questions
Liberty Stores Case Exercise: Step 5
Chapter 7: Regression
Introduction
Caselet: Data driven Prediction Markets
Correlations and Relationships
Visual look at relationships
Regression Exercise
Non-linear regression exercise
Logistic Regression
Advantages and Disadvantages of Regression Models
Conclusion
Review Exercises:
Liberty Stores Case Exercise: Step 6
Chapter 8: Artificial Neural Networks
Introduction
Caselet: IBM Watson - Analytics in Medicine
Business Applications of ANN
Design Principles of an Artificial Neural Network
Representation of a Neural Network
Architecting a Neural Network
Developing an ANNAdvantages and Disadvantages of using ANNs
Conclusion
Review Exercises
Chapter 9: Cluster Analysis
Introduction
Caselet: Cluster Analysis
Applications of Cluster Analysis
Definition of a Cluster
Representing clusters
Clustering techniques
Clustering Exercise
K-Means Algorithm for clustering
Selecting the number of clusters
Advantages and Disadvantages of K-Means algorithm
Conclusion
Review Exercises
Liberty Stores Case Exercise: Step 7
Chapter 10: Association Rule Mining
Introduction
Caselet: Netflix: Data Mining in Entertainment
Business Applications of Association Rules
Representing Association Rules
Algorithms for Association Rule
Apriori Algorithm
Association rules exercise
Creating Association Rules
Conclusion
Review Exercises
Liberty Stores Case Exercise: Step 8
Section 3 – Advanced Mining
Chapter 11: Text Mining
Introduction
Caselet: WhatsApp and Private Security
Text Mining Applications
Text Mining Process
Term Document Matrix
Mining the TDM
Comparing Text Mining and Data Mining
Text Mining Best PracticesConclusion
Review Questions
Liberty Stores Case Exercise: Step 9
Chapter 12: Naïve Bayes Analysis
Introduction
CASELET: Fraud detection in government contracts
Probability
Naïve-Bayes model
Simple classification example
Text Classification Example
Advantages and Disadvantages of Naïve Bayes
Summary
Review Questions
Chapter 13: Support Vector Machines
Introduction
SVM model
The Kernel Method
Advantages and disadvantages
Summary
Review Questions
Chapter 14: Web Mining
Introduction
Web content mining
Web structure mining
Web usage mining
Web Mining Algorithms
Conclusion
Review Questions
Chapter 15: Social Network Analysis
Introduction
Caselet: The Social Life of Books
Applications of SNA
Network topologies
Techniques and algorithms
Finding Sub-networks
Computing importance of nodes
PageRank
Practical considerations
Comparing SNA with Data AnalyticsConclusion
Review Questions
Section 4 - Primers
Chapter 16: Big Data Primer
Introduction
Understanding Big Data
CASELET: IBM Watson: A Big Data system
Capturing Big Data
Volume of Data
Velocity of Data
Variety of Data
Veracity of Data
Benefitting from Big Data
Management of Big Data
Organizing Big Data
Analyzing Big Data
Technology Challenges for Big Data
Storing Huge Volumes
Ingesting streams at an extremely fast pace
Handling a variety of forms and functions of data
Processing data at huge speeds
Conclusion and Summary
Review Questions
Liberty Stores Case Exercise: Step P1
Chapter 17: Data Modeling Primer
Introduction
Evolution of data management systems
Relational Data Model
Implementing the Relational Data Model
Database management systems (DBMS)
Structured Query Language
Conclusion
Review Questions
Chapter 18: Statistics Primer
Introduction
Descriptive Statistics
Example data set
Computing Mean, Median, Mode
Computing the range and varianceHistograms
Normal Distribution and Bell Curve
Inferential Statistics
Random sampling
Confidence Interval
Predictive Statistics
Summary
Review Questions
Chapter 19 - Artificial Intelligence Primer
CASELET: Apple Siri Voice-activated personal assistant
AI, Machine Learning, and Deep Learning
The Industrial Revolution
The Information Revolution
The Cognitive (or AI) revolution
Jobs Losses and Gains
AI and Existential Threat
Conclusion
Review Questions
Chapter 20: Data Ownership and Privacy
Data Ownership
Data Privacy
Data Privacy Models
Chinese Model
US Model
European Model
Summary
Chapter 21: Data Science Careers
Data Scientist
Data Engineer
Data Science aptitude
Popular Skills
Appendix: R Tutorial for Data Mining
Getting Started with R
Installing R
Working on R
Import a Dataset in R
Data visualization
Plotting a Histogram
Ploting a Bar ChartPloting charts side by side
Data Mining Techniques
Decision Tree
Correlation
Regression
Clustering – Kmeans (Unsupervised Learning)
Big Data Mining
WordCloud
Twitter Mining
Steps on Twitter side
R Script
Page Rank
Additional Documentation
Appendix: Python Tutorial for Data Mining
1 About this Tutorial
2 Getting Started
3 Installation
4 Working on Python
4.1 Windows 7
4.2 Windows 10
4.3 Python Help and Tutorial
4.4 Import a Dataset in Python
4.5 Data visualization –
4.5.1 Ploting a Histogram
4.5.2 Plotting a Bar Chart
4.5.3 Ploting charts side by side
5 Data Mining Techniques
5.1 Decision Tree (Supervised Learning)
5.2 Regression (Supervised Learning)
5.3 Correlation (Supervised Learning)
5.4 Clustering – Kmeans (Unsupervised Learning)
6 Big Data Mining
6.1 WordCloud - directory FWordCloud and look at code module WordCloud.py.
6.2 Twitter Mining
6.2.1 Steps (Twitter side)
6.2.2 Python code
6.3 Page Rank
7 Additional Documentation
Additional ResourcesAbout the Author