Author(s): Magnus Vilhelm Persson; Luiz Felipe Martins
Publisher: Packt Publishing
Year: 2016
Language: English
Pages: 284
Mastering Python Data Analysis
Mastering Python Data Analysis
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Tools of the Trade
Before you start
Using the notebook interface
Imports
An example using the Pandas library
Summary
2. Exploring Data
The General Social Survey
Obtaining the data
Reading the data
Univariate data
Histograms
Making things pretty
Characterization
Concept of statistical inference
Numeric summaries and boxplots
Relationships between variables – scatterplots
Summary
3. Learning About Models
Models and experiments
The cumulative distribution function
Working with distributions
The probability density function
Where do models come from?
Multivariate distributions
Summary
4. Regression
Introducing linear regression
Getting the dataset
Testing with linear regression
Multivariate regression
Adding economic indicators
Taking a step back
Logistic regression
Some notes
Summary
5. Clustering
Introduction to cluster finding
Starting out simple – John Snow on cholera
K-means clustering
Suicide rate versus GDP versus absolute latitude
Hierarchical clustering analysis
Reading in and reducing the data
Hierarchical cluster algorithm
Summary
6. Bayesian Methods
The Bayesian method
Credible versus confidence intervals
Bayes formula
Python packages
U.S. air travel safety record
Getting the NTSB database
Binning the data
Bayesian analysis of the data
Binning by month
Plotting coordinates
Cartopy
Mpl toolkits – basemap
Climate change - CO2 in the atmosphere
Getting the data
Creating and sampling the model
Summary
7. Supervised and Unsupervised Learning
Introduction to machine learning
Scikit-learn
Linear regression
Climate data
Checking with Bayesian analysis and OLS
Clustering
Seeds classification
Visualizing the data
Feature selection
Classifying the data
The SVC linear kernel
The SVC Radial Basis Function
The SVC polynomial
K-Nearest Neighbour
Random Forest
Choosing your classifier
Summary
8. Time Series Analysis
Introduction
Pandas and time series data
Indexing and slicing
Resampling, smoothing, and other estimates
Stationarity
Patterns and components
Decomposing components
Differencing
Time series models
Autoregressive – AR
Moving average – MA
Selecting p and q
Automatic function
The (Partial) AutoCorrelation Function
Autoregressive Integrated Moving Average – ARIMA
Summary
A. More on Jupyter Notebook and matplotlib Styles
Jupyter Notebook
Useful keyboard shortcuts
Command mode shortcuts
Edit mode shortcuts
Markdown cells
Notebook Python extensions
Installing the extensions
Codefolding
Collapsible headings
Help panel
Initialization cells
NbExtensions menu item
Ruler
Skip-traceback
Table of contents
Other Jupyter Notebook tips
External connections
Export
Additional file types
Matplotlib styles
Useful resources
General resources
Packages
Data repositories
Visualization of data
Summary