Python: End-to-end Data Analysis : Leverage the Power of Python to Clean, Scrape, Analyze, and Visualize Your Data : a Course in Three Modules

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Phuong Vo. T. H; Phuong Vothihong
Year: 2016

Language: English
Pages: 931

Cover
Copyright
Credits
Preface
Module 1 & 2: Table of content
Module 3: Table of content
Module 1: Getting Started with Python Data Analysis
Chapter 1: Introducing Data Analysis and Libraries
Data analysis and processing
An overview of the libraries in data analysis
Python libraries in data analysis
NumPy
Pandas
Matplotlib
PyMongo
The scikit-learn library
Summary
Chapter 2: NumPy Arrays and Vectorized Computation
NumPy arrays
Data types
Array creation
Indexing and slicing
Fancy indexing
Numerical operations on arrays
Array functions
Data processing using arrays
Loading and saving data
Saving an array
Loading an array
Linear algebra with NumPy
NumPy random numbers
Summary
Chapter 3: Data Analysis with Pandas
An overview of the Pandas package
The Pandas data structure
Series
The DataFrame
The essential basic functionality
Reindexing and altering labels
Head and tail
Binary operations
Functional statistics
Function application
Sorting
Indexing and selecting data
Computational tools
Working with missing data
Advanced uses of Pandas for data analysis
Hierarchical indexing
The Panel data
Summary
Chapter 4: Data Visualization
The matplotlib API primer
Line properties
Figures and subplots
Exploring plot types
Scatter plots
Bar plots
Contour plots
Histogram plots
Legends and annotations
Plotting functions with Pandas
Additional Python data visualization tools
Bokeh
MayaVi
Summary
Chapter 5: Time Series
Time series primer
Working with date and time objects
Resampling time series
Downsampling time series data
Upsampling time series data
Time zone handling
Timedeltas
Time series plotting
Summary
Chapter 6: Interacting with Databases
Interacting with data in text format
Reading data from text format
Writing data to text format
Interacting with data in binary format
HDF5
Interacting with data in MongoDB
Interacting with data in Redis
The simple value
List
Set
Ordered set
Summary
Chapter 7: Data Analysis Application Examples
Data munging
Cleaning data
Filtering
Merging data
Reshaping data
Data aggregation
Grouping data
Summary
Chapter 8: Machine Learning Models with scikit-learn
An overview of machine learning models
The scikit-learn modules for different models
Data representation in scikit-learn
Supervised learning – classification and regression
Unsupervised learning – clustering and dimensionality reduction
Measuring prediction performance
Summary
Module 2: Python Data Analysis Cookbook
Chapter 1: Laying the Foundation for Reproducible
Data Analysis
Introduction
Setting up Anaconda
Getting ready
How to do it...
There's more...
See also
Installing the Data Science Toolbox
Getting ready
How to do it...
How it works...
See also
Creating a virtual environment with virtualenv and virtualenvwrapper
Getting ready
How to do it...
See also
Sandboxing Python applications with Docker images
Getting ready
How to do it...
How it works...
See also
Keeping track of package versions and history in IPython Notebook
Getting ready
How to do it...
How it works...
See also
Configuring IPython
Getting ready
How to do it...
See also
Learning to log for robust error checking
Getting ready
How to do it...
How it works...
See also
Unit testing your code
Getting ready
How to do it...
How it works...
See also
Configuring pandas
Getting ready
How to do it...
Configuring matplotlib
Getting ready
How to do it...
How it works...
See also
Seeding random number generators and NumPy print options
Getting ready
How to do it...
See also
Standardizing reports, code style, and data access
Getting ready
How to do it...
See also
Chapter 2: Creating Attractive Data Visualizations
Introduction
Graphing Anscombe's quartet
How to do it...
See also
Choosing seaborn color palettes
How to do it...
See also
Choosing matplotlib color maps
How to do it...
See also
Interacting with IPython Notebook widgets
How to do it...
See also
Viewing a matrix of scatterplots
How to do it...
Visualizing with d3.js via mpld3
Getting ready
How to do it...
Creating heatmaps
Getting ready
How to do it...
See also
Combining box plots and kernel density plots with violin plots
How to do it...
See also
Visualizing network graphs with hive plots
Getting ready
How to do it...
Displaying geographical maps
Getting ready
How to do it...
Using ggplot2-like plots
Getting ready
How to do it...
Highlighting data points with influence plots
How to do it...
See also
Chapter 3: Statistical Data Analysis and Probability
Introduction
Fitting data to the exponential distribution
How to do it...
How it works…
See also
Fitting aggregated data to the gamma distribution
How to do it...
See also
Fitting aggregated counts to the Poisson distribution
How to do it...
See also
Determining bias
How to do it...
See also
Estimating kernel density
How to do it...
See also
Determining confidence intervals for mean, variance, and standard deviation
How to do it...
See also
Sampling with probability weights
How to do it...
See also
Exploring extreme values
How to do it...
See also
Correlating variables with Pearson's correlation
How to do it...
See also
Correlating variables with the Spearman rank correlation
How to do it...
See also
Correlating a binary and a continuous variable with the point biserial correlation
How to do it...
See also
Evaluating relations between variables with ANOVA
How to do it...
See also
Chapter 4: Dealing with Data and Numerical Issues
Introduction
Clipping and filtering outliers
How to do it...
See also
Winsorizing data
How to do it...
See also
Measuring central tendency of noisy data
How to do it...
See also
Normalizing with the Box-Cox transformation
How to do it...
How it works
See also
Transforming data with the power ladder
How to do it...
Transforming data with logarithms
How to do it...
Rebinning data
How to do it...
Applying logit() to transform proportions
How to do it...
Fitting a robust linear model
How to do it...
See also
Taking variance into account with weighted least squares
How to do it...
See also
Using arbitrary precision for optimization
Getting ready
How to do it...
See also
Using arbitrary precision for linear algebra
Getting ready
How to do it...
See also
Chapter 5: Web Mining, Databases, and Big Data
Introduction
Simulating web browsing
Getting ready
How to do it…
See also
Scraping the Web
Getting ready
How to do it…
Dealing with non-ASCII text and HTML entities
Getting ready
How to do it…
See also
Implementing association tables
Getting ready
How to do it…
Setting up database migration scripts
Getting ready
How to do it…
See also
Adding a table column to an existing table
Getting ready
How to do it…
Adding indices after table creation
Getting ready
How to do it…
How it works…
See also
Setting up a test web server
Getting ready
How to do it…
Implementing a star schema with fact and dimension tables
How to do it…
See also
Using HDFS
Getting ready
How to do it…
See also
Setting up Spark
Getting ready
How to do it…
See also
Clustering data with Spark
Getting ready
How to do it…
How it works…
There's more…
See also
Chapter 6: Signal Processing
and Timeseries
Introduction
Spectral analysis with periodograms
How to do it...
See also
Estimating power spectral density with the Welch method
How to do it...
See also
Analyzing peaks
How to do it...
See also
Measuring phase synchronization
How to do it...
See also
Exponential smoothing
How to do it...
See also
Evaluating smoothing
How to do it...
See also
Using the Lomb-Scargle periodogram
How to do it...
See also
Analyzing the frequency spectrum of audio
How to do it...
See also
Analyzing signals with the discrete cosine transform
How to do it...
See also
Block bootstrapping time series data
How to do it...
See also
Moving block bootstrapping time series data
How to do it...
See also
Applying the discrete wavelet transform
Getting started
How to do it...
See also
Chapter 7: Selecting Stocks with Financial Data Analysis
Introduction
Computing simple and log returns
How to do it...
See also
Ranking stocks with the Sharpe ratio and liquidity
How to do it...
See also
Ranking stocks with the Calmar and Sortino ratios
How to do it...
See also
Analyzing returns statistics
How to do it...
Correlating individual stocks with the broader market
How to do it...
Exploring risk and return
How to do it...
See also
Examining the market with the non-parametric runs test
How to do it...
See also
Testing for random walks
How to do it...
See also
Determining market efficiency with autoregressive models
How to do it...
See also
Creating tables for a stock prices database
How to do it...
Populating the stock prices database
How to do it...
Optimizing an equal weights two-asset portfolio
How to do it...
See also
Chapter 8: Text Mining and Social Network Analysis
Introduction
Creating a categorized corpus
Getting ready
How to do it...
See also
Tokenizing news articles in sentences and words
Getting ready
How to do it...
See also
Stemming, lemmatizing, filtering, and TF-IDF scores
Getting ready
How to do it...
How it works
See also
Recognizing named entities
Getting ready
How to do it...
How it works
See also
Extracting topics with non-negative matrix factorization
How to do it...
How it works
See also
Implementing a basic terms database
How to do it...
How it works
See also
Computing social network density
Getting ready
How to do it...
See also
Calculating social network closeness centrality
Getting ready
How to do it...
See also
Determining the betweenness centrality
Getting ready
How to do it...
See also
Estimating the average clustering coefficient
Getting ready
How to do it...
See also
Calculating the assortativity coefficient of a graph
Getting ready
How to do it...
See also
Getting the clique number of a graph
Getting ready
How to do it...
See also
Creating a document graph with cosine similarity
How to do it...
See also
Chapter 9: Ensemble Learning and Dimensionality Reduction
Introduction
Recursively eliminating features
How to do it...
How it works
See also
Applying principal component analysis for dimension reduction
How to do it...
See also
Applying linear discriminant analysis for dimension reduction
How to do it...
See also
Stacking and majority voting for multiple models
How to do it...
See also
Learning with random forests
How to do it...
There's more…
See also
Fitting noisy data with the RANSAC algorithm
How to do it...
See also
Bagging to improve results
How to do it...
See also
Boosting for better learning
How to do it...
See also
Nesting cross-validation
How to do it...
See also
Reusing models with joblib
How to do it...
See also
Hierarchically clustering data
How to do it...
See also
Taking a Theano tour
Getting ready
How to do it...
See also
Chapter 10: Evaluating Classifiers, Regressors, and Clusters
Introduction
Getting classification straight with the confusion matrix
How to do it...
How it works
See also
Computing precision, recall, and F1-score
How to do it...
See also
Examining a receiver operating characteristic and the area under a curve
How to do it...
See also
Visualizing the goodness of fit
How to do it...
See also
Computing MSE and median absolute error
How to do it...
See also
Evaluating clusters with the mean silhouette coefficient
How to do it...
See also
Comparing results with a dummy classifier
How to do it...
See also
Determining MAPE and MPE
How to do it...
See also
Comparing with a dummy regressor
How to do it...
See also
Calculating the mean absolute error and the residual sum of squares
How to do it...
See also
Examining the kappa of classification
How to do it...
How it works
See also
Taking a look at the Matthews correlation coefficient
How to do it...
See also
Chapter 11: Analyzing Images
Introduction
Setting up OpenCV
Getting ready
How to do it...
How it works
There's more
Applying Scale-Invariant Feature Transform (SIFT)
Getting ready
How to do it...
See also
Detecting features with SURF
Getting ready
How to do it...
See also
Quantizing colors
Getting ready
How to do it...
See also
Denoising images
Getting ready
How to do it...
See also
Extracting patches from an image
Getting ready
How to do it...
See also
Detecting faces with Haar cascades
Getting ready
How to do it...
See also
Searching for bright stars
Getting ready
How to do it...
See also
Extracting metadata from images
Getting ready
How to do it...
See also
Extracting texture features from images
Getting ready
How to do it...
See also
Applying hierarchical clustering on images
How to do it...
See also
Segmenting images with spectral clustering
How to do it...
See also
Chapter 12: Parallelism and Performance
Introduction
Just-in-time compiling with Numba
Getting ready
How to do it...
How it works
See also
Speeding up numerical expressions with Numexpr
How to do it...
How it works
See also
Running multiple threads with the threading module
How to do it...
See also
Launching multiple tasks with the concurrent.futures module
How to do it...
See also
Accessing resources asynchronously with the asyncio module
How to do it...
See also
Distributed processing with execnet
Getting ready
How to do it...
See also
Profiling memory usage
Getting ready
How to do it...
See also
Calculating the mean, variance, skewness, and kurtosis on the fly
Getting ready
How to do it...
See also
Caching with a least recently used cache
Getting ready
How to do it...
See also
Caching HTTP requests
Getting ready
How to do it...
See also
Streaming counting with the Count-min sketch
How to do it...
See also
Harnessing the power of the GPU with OpenCL
Getting ready
How to do it...
See also
Appendix A: Glossary
Appendix B: Function Reference
IPython
Matplotlib
NumPy
pandas
Scikit-learn
SciPy
Seaborn
Statsmodels
Appendix C: Online Resources
IPython notebooks and open data
Mathematics and statistics
Presentations
Appendix D: Tips and Tricks for Command-Line and Miscellaneous Tools
IPython notebooks
Command-line tools
The alias command
Command-line history
Reproducible sessions
Docker tips
Module 3: Mastering Python Data Analysis
Chapter 1: Tools of the Trade
Chapter 2: Exploring Data
Chapter 3: Learning About Models
Chapter 4: Regression
Chapter 5: Clustering
Chapter 6: Bayesian Methods
Chapter 7: Supervised and UnsupervisedLearning
Chapter 8: Time Series Analysis
Appendix: More on Jupyter Notebook andmatplotlib Styles
Bibliography
Index
Thanks Page
Blank Page
Untitled