Python for Probability, Statistics, and Machine Learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book, fully updated for Python version 3.6+, covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. All the figures and numerical results are reproducible using the Python codes provided. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Detailed proofs for certain important results are also provided. Modern Python modules like Pandas, Sympy, Scikit-learn, Tensorflow, and Keras are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This updated edition now includes the Fisher Exact Test and the Mann-Whitney-Wilcoxon Test. A new section on survival analysis has been included as well as substantial development of Generalized Linear Models. The new deep learning section for image processing includes an in-depth discussion of gradient descent methods that underpin all deep learning algorithms. As with the prior edition, there are new and updated *Programming Tips* that the illustrate effective Python modules and methods for scientific programming and machine learning. There are 445 run-able code blocks with corresponding outputs that have been tested for accuracy. Over 158 graphical visualizations (almost all generated using Python) illustrate the concepts that are developed both in code and in mathematics. We also discuss and use key Python modules such as Numpy, Scikit-learn, Sympy, Scipy, Lifelines, CvxPy, Theano, Matplotlib, Pandas, Tensorflow, Statsmodels, and Keras. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowledge of Python programming.

Author(s): José Unpingco
Publisher: Springer
Year: 2016

Language: English
Pages: 276

Preface
Acknowledgments
Contents
Notation
About the Author
1 Getting Started with Scientific Python
1.1 Installation and Setup
1.2 Numpy
1.2.1 Numpy Arrays and Memory
1.2.2 Numpy Matrices
1.2.3 Numpy Broadcasting
1.2.4 Numpy Masked Arrays
1.2.5 Numpy Optimizations and Prospectus
1.3 Matplotlib
1.3.1 Alternatives to Matplotlib
1.3.2 Extensions to Matplotlib
1.4 IPython
1.4.1 IPython Notebook
1.5 Scipy
1.6 Pandas
1.6.1 Series
1.6.2 Dataframe
1.7 Sympy
1.8 Interfacing with Compiled Libraries
1.9 Integrated Development Environments
1.10 Quick Guide to Performance and Parallel Programming
1.11 Other Resources
References
2 Probability
2.1 Introduction
2.1.1 Understanding Probability Density
2.1.2 Random Variables
2.1.3 Continuous Random Variables
2.1.4 Transformation of Variables Beyond Calculus
2.1.5 Independent Random Variables
2.1.6 Classic Broken Rod Example
2.2 Projection Methods
2.2.1 Weighted Distance
2.3 Conditional Expectation as Projection
2.3.1 Appendix
2.4 Conditional Expectation and Mean Squared Error
2.5 Worked Examples of Conditional Expectation and Mean Square Error Optimization
2.5.1 Example
2.5.2 Example
2.5.3 Example
2.5.4 Example
2.5.5 Example
2.5.6 Example
2.6 Information Entropy
2.6.1 Information Theory Concepts
2.6.2 Properties of Information Entropy
2.6.3 Kullback-Leibler Divergence
2.7 Moment Generating Functions
2.8 Monte Carlo Sampling Methods
2.8.1 Inverse CDF Method for Discrete Variables
2.8.2 Inverse CDF Method for Continuous Variables
2.8.3 Rejection Method
2.9 Useful Inequalities
2.9.1 Markov's Inequality
2.9.2 Chebyshev's Inequality
2.9.3 Hoeffding's Inequality
References
3 Statistics
3.1 Introduction
3.2 Python Modules for Statistics
3.2.1 Scipy Statistics Module
3.2.2 Sympy Statistics Module
3.2.3 Other Python Modules for Statistics
3.3 Types of Convergence
3.3.1 Almost Sure Convergence
3.3.2 Convergence in Probability
3.3.3 Convergence in Distribution
3.3.4 Limit Theorems
3.4 Estimation Using Maximum Likelihood
3.4.1 Setting Up the Coin Flipping Experiment
3.4.2 Delta Method
3.5 Hypothesis Testing and P-Values
3.5.1 Back to the Coin Flipping Example
3.5.2 Receiver Operating Characteristic
3.5.3 P-Values
3.5.4 Test Statistics
3.5.5 Testing Multiple Hypotheses
3.6 Confidence Intervals
3.7 Linear Regression
3.7.1 Extensions to Multiple Covariates
3.8 Maximum A-Posteriori
3.9 Robust Statistics
3.10 Bootstrapping
3.10.1 Parametric Bootstrap
3.11 Gauss Markov
3.12 Nonparametric Methods
3.12.1 Kernel Density Estimation
3.12.2 Kernel Smoothing
3.12.3 Nonparametric Regression Estimators
3.12.4 Nearest Neighbors Regression
3.12.5 Kernel Regression
3.12.6 Curse of Dimensionality
References
4 Machine Learning
4.1 Introduction
4.2 Python Machine Learning Modules
4.3 Theory of Learning
4.3.1 Introduction to Theory of Machine Learning
4.3.2 Theory of Generalization
4.3.3 Worked Example for Generalization/Approximation Complexity
4.3.4 Cross-Validation
4.3.5 Bias and Variance
4.3.6 Learning Noise
4.4 Decision Trees
4.4.1 Random Forests
4.5 Logistic Regression
4.5.1 Generalized Linear Models
4.6 Regularization
4.6.1 Ridge Regression
4.6.2 Lasso
4.7 Support Vector Machines
4.7.1 Kernel Tricks
4.8 Dimensionality Reduction
4.8.1 Independent Component Analysis
4.9 Clustering
4.10 Ensemble Methods
4.10.1 Bagging
4.10.2 Boosting
References
Index