Practical Data Science Cookbook: Data Pre-Processing, Analysis and Visualization Using R and Python

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Over 85 recipes to help you complete real-world data science projects in R and Python About This Book - Tackle every step in the data science pipeline and use it to acquire, clean, analyze, and visualize your data - Get beyond the theory and implement real-world projects in data science using R and Python - Easy-to-follow recipes will help you understand and implement the numerical computing concepts Who This Book Is For If you are an aspiring data scientist who wants to learn data science and numerical programming concepts through hands-on, real-world project examples, this is the book for you. Whether you are brand new to data science or you are a seasoned expert, you will benefit from learning about the structure of real-world data science projects and the programming examples in R and Python. What You Will Learn - Learn and understand the installation procedure and environment required for R and Python on various platforms - Prepare data for analysis by implement various data science concepts such as acquisition, cleaning and munging through R and Python - Build a predictive model and an exploratory model - Analyze the results of your model and create reports on the acquired data - Build various tree-based methods and Build random forest In Detail As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don't. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use. Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis-R and Python. Style and approach This step-by-step guide to data science is full of hands-on examples of real-world data science tasks. Each recipe focuses on a particular task involved in the data science pipeline, ranging from readying the dataset to analytics and visualization

Author(s): Prabhanjan Narayanachar Tattar; Tony Ojeda; Sean Patrick Murphy; Benjamin Bengfort; Abhijit Dasgupta
Edition: 2
Publisher: Packt Publishing
Year: 2017

Language: English

Cover
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Preparing Your Data Science Environment
Understanding the data science pipeline
How to do it...
How it works...
Installing R on Windows, Mac OS X, and Linux
How to do it...
How it works...
See also
Installing libraries in R and RStudio
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Python on Linux and Mac OS X
Getting ready
How to do it...
How it works...
See also
Installing Python on Windows
How to do it...
How it works...
See also
Installing the Python data stack on Mac OS X and Linux
Getting ready
How to do it...
How it works...
There's more...
See also
Installing extra Python packages
Getting ready
How to do it...
How it works...
There's more...
See also
Installing and using virtualenv
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 2: Driving Visual Analysis with Automobile Data with R
Introduction
Acquiring automobile fuel efficiency data
Getting ready
How to do it...
How it works...
Preparing R for your first project
Getting ready
How to do it...
There's more...
See also
Importing automobile fuel efficiency data into R
Getting ready
How to do it...
How it works...
There's more...
See also
Exploring and describing fuel efficiency data
Getting ready
How to do it...
How it works...
There's more...
Analyzing automobile fuel efficiency over time
Getting ready
How to do it...
How it works...
There's more...
See also
Investigating the makes and models of automobiles
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 3: Creating Application-Oriented Analyses Using Tax Data and Python
Introduction
An introduction to application-oriented approaches
Preparing for the analysis of top incomes
Getting ready
How to do it...
How it works...
Importing and exploring the world's top incomes dataset
Getting ready
How to do it...
How it works...
There's more...
See also
Analyzing and visualizing the top income data of the US
Getting ready
How to do it...
How it works...
Furthering the analysis of the top income groups of the US
Getting ready
How to do it...
How it works...
Reporting with Jinja2
Getting ready
How to do it...
How it works...
There's more...
See also
Repeating the analysis in R
Getting ready
How to do it...
There's more...
Chapter 4: Modeling Stock Market Data
Introduction
Requirements
Acquiring stock market data
How to do it...
Summarizing the data
Getting ready
How to do it...
How it works...
There's more...
Cleaning and exploring the data
Getting ready
How to do it...
How it works...
See also
Generating relative valuations
Getting ready
How to do
How it works...
Screening stocks and analyzing historical prices
Getting ready
How to do it...
How it works...
Chapter 5: Visually Exploring Employment Data
Introduction
Preparing for analysis
Getting ready
How to do it...
How it works...
See also
Importing employment data into R
Getting ready
How to do it...
How it works...
There's more...
See also
Exploring the employment data
Getting ready
How to do it...
How it works...
See also
Obtaining and merging additional data
Getting ready
How to do it...
How it works...
Adding geographical information
Getting ready
How to do it...
How it works...
See also
Extracting state- and county-level wage and employment information
Getting ready
How to do it...
How it works...
See also
Visualizing geographical distributions of pay
Getting ready
How to do it...
How it works...
See also
Exploring where the jobs are, by industry
How to do it...
How it works...
There's more...
See also
Animating maps for a geospatial time series
Getting ready
How to do it...
How it works...
There is more...
Benchmarking performance for some common tasks
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 6: Driving Visual Analyses with Automobile Data
Introduction
Getting started with IPython
Getting ready
How to do it...
How it works...
See also
Exploring Jupyter Notebook
Getting ready
How to do it...
How it works...
There's more...
See also
Preparing to analyze automobile fuel efficiencies
Getting ready
How to do it...
How it works...
There's more...
See also
Exploring and describing fuel efficiency data with Python
Getting ready
How to do it...
How it works...
There's more...
See also
Analyzing automobile fuel efficiency over time with Python
Getting ready
How to do it...
How it works...
There's more...
See also
Investigating the makes and models of automobiles with Python
Getting ready
How to do it...
How it works...
See also
Chapter 7: Working with Social Graphs
Introduction
Understanding graphs and networks
Preparing to work with social networks in Python
Getting ready
How to do it...
How it works...
There's more...
Importing networks
Getting ready
How to do it...
How it works...
Exploring subgraphs within a heroic network
Getting ready
How to do it...
How it works...
There's more...
Finding strong ties
Getting ready
How to do it...
How it works...
There's more...
Finding key players
Getting ready
How to do it...
How it works...
There's more...
The betweenness centrality
The closeness centrality
The eigenvector centrality
Deciding on centrality algorithm
Exploring the characteristics of entire networks
Getting ready
How to do it...
How it works...
Clustering and community detection in social networks
Getting ready
How to do it...
How it works...
There's more...
Visualizing graphs
Getting ready
How to do it...
How it works...
Social networks in R
Getting ready
How to do it...
How it works...
Chapter 8: Recommending Movies at Scale (Python)
Introduction
Modeling preference expressions
How to do it...
How it works...
Understanding the data
Getting ready
How to do it...
How it works...
There's more...
Ingesting the movie review data
Getting ready
How to do it...
How it works...
Finding the highest-scoring movies
Getting ready
How to do it...
How it works...
There's more...
See also
Improving the movie-rating system
Getting ready
How to do it...
How it works...
There's more...
See also
Measuring the distance between users in the preference space
Getting ready
How to do it...
How it works...
There's more...
See also
Computing the correlation between users
Getting ready
How to do it...
How it works...
There's more...
Finding the best critic for a user
Getting ready
How to do it...
How it works...
Predicting movie ratings for users
Getting ready
How to do it...
How it works...
Collaboratively filtering item by item
Getting ready
How to do it...
How it works...
Building a non-negative matrix factorization model
How to do it...
How it works...
See also
Loading the entire dataset into the memory
Getting ready
How to do it...
How it works...
There's more...
Dumping the SVD-based model to the disk
How to do it...
How it works...
Training the SVD-based model
How to do it...
How it works...
There's more...
Testing the SVD-based model
How to do it...
How it works...
There's more...
Chapter 9: Harvesting and Geolocating Twitter Data (Python)
Introduction
Creating a Twitter application
Getting ready
How to do it...
How it works...
See also
Understanding the Twitter API v1.1
Getting ready
How to do it...
How it works...
There's more...
See also
Determining your Twitter followers and friends
Getting ready
How to do it...
How it works...
There's more...
See also
Pulling Twitter user profiles
Getting ready
How to do it...
How it works...
There's more...
See also
Making requests without running afoul of Twitter's rate limits
Getting ready
How to do it...
How it works...
Storing JSON data to disk
Getting ready
How to do it...
How it works...
Setting up MongoDB for storing Twitter data
Getting ready
How to do it...
How it works...
There's more...
See also
Storing user profiles in MongoDB using PyMongo
Getting ready
How to do it...
How it works...
Exploring the geographic information available in profiles
Getting ready
How to do it...
How it works...
There's more...
See also
Plotting geospatial data in Python
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 10: Forecasting New Zealand Overseas Visitors
Introduction
The ts object
Getting ready
How to do it
How it works...
Visualizing time series data
Getting ready
How to do it...
How it works...
Simple linear regression models
Getting ready
How to do it...
How it works...
See also
ACF and PACF
Getting ready
How to do it...
How it works...
ARIMA models
Getting ready
How to do it...
How it works...
Accuracy measurements
Getting ready
How to do it...
How it works...
Fitting seasonal ARIMA models
Getting ready
How to do it...
How it works...
There's more...
Chapter 11: German Credit Data Analysis
Introduction
Simple data transformations
Getting ready
How to do it...
How it works...
There's more...
Visualizing categorical data
Getting ready
How to do it...
How it works...
Discriminant analysis
Getting ready
How to do it...
How it works...
See also
Dividing the data and the ROC
Getting ready
How to do it...
Fitting the logistic regression model
Getting ready
How to do it...
How it works...
See also
Decision trees and rules
Getting ready
How to do it...
How it works...
See also
Decision tree for german data
Getting ready
How to do it ...
How it works...
Index