Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and Statistics

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Master the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics and how they apply to techniques like linear regression, logistic regression, and neural networks. Along the way you'll also gain practical insights into the state of data science and how to use those insights to maximize your career. Learn how to: • Use Python code and libraries like SymPy, NumPy, and scikit-learn to explore essential mathematical concepts like calculus, linear algebra, statistics, and machine learning • Understand techniques like linear regression, logistic regression, and neural networks in plain English, with minimal mathematical notation and jargon • Perform descriptive statistics and hypothesis testing on a dataset to interpret p-values and statistical significance • Manipulate vectors and matrices and perform matrix decomposition • Integrate and build upon incremental knowledge of calculus, probability, statistics, and linear algebra, and apply it to regression models including neural networks • Navigate practically through a data science career and avoid common pitfalls, assumptions, and biases while tuning your skill set to stand out in the job market

Author(s): Thomas Nield
Edition: 1
Publisher: O'Reilly Media
Year: 2022

Language: English
Commentary: Vector PDF
Pages: 348
City: Sebastopol, CA
Tags: Data Science; Linear Algebra; Probability; Statistics; Calculus; Linear Regression; Logistic Regression; Neural Networks; Python; SymPy; NumPy; scikit;

Cover
Copyright
Table of Contents
Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Basic Math and Calculus Review
Number Theory
Order of Operations
Variables
Functions
Summations
Exponents
Logarithms
Euler’s Number and Natural Logarithms
Euler’s Number
Natural Logarithms
Limits
Derivatives
Partial Derivatives
The Chain Rule
Integrals
Conclusion
Exercises
Chapter 2. Probability
Understanding Probability
Probability Versus Statistics
Probability Math
Joint Probabilities
Union Probabilities
Conditional Probability and Bayes’ Theorem
Joint and Union Conditional Probabilities
Binomial Distribution
Beta Distribution
Conclusion
Exercises
Chapter 3. Descriptive and Inferential Statistics
What Is Data?
Descriptive Versus Inferential Statistics
Populations, Samples, and Bias
Descriptive Statistics
Mean and Weighted Mean
Median
Mode
Variance and Standard Deviation
The Normal Distribution
The Inverse CDF
Z-Scores
Inferential Statistics
The Central Limit Theorem
Confidence Intervals
Understanding P-Values
Hypothesis Testing
The T-Distribution: Dealing with Small Samples
Big Data Considerations and the Texas Sharpshooter Fallacy
Conclusion
Exercises
Chapter 4. Linear Algebra
What Is a Vector?
Adding and Combining Vectors
Scaling Vectors
Span and Linear Dependence
Linear Transformations
Basis Vectors
Matrix Vector Multiplication
Matrix Multiplication
Determinants
Special Types of Matrices
Square Matrix
Identity Matrix
Inverse Matrix
Diagonal Matrix
Triangular Matrix
Sparse Matrix
Systems of Equations and Inverse Matrices
Eigenvectors and Eigenvalues
Conclusion
Exercises
Chapter 5. Linear Regression
A Basic Linear Regression
Residuals and Squared Errors
Finding the Best Fit Line
Closed Form Equation
Inverse Matrix Techniques
Gradient Descent
Overfitting and Variance
Stochastic Gradient Descent
The Correlation Coefficient
Statistical Significance
Coefficient of Determination
Standard Error of the Estimate
Prediction Intervals
Train/Test Splits
Multiple Linear Regression
Conclusion
Exercises
Chapter 6. Logistic Regression and Classification
Understanding Logistic Regression
Performing a Logistic Regression
Logistic Function
Fitting the Logistic Curve
Multivariable Logistic Regression
Understanding the Log-Odds
R-Squared
P-Values
Train/Test Splits
Confusion Matrices
Bayes’ Theorem and Classification
Receiver Operator Characteristics/Area Under Curve
Class Imbalance
Conclusion
Exercises
Chapter 7. Neural Networks
When to Use Neural Networks and Deep Learning
A Simple Neural Network
Activation Functions
Forward Propagation
Backpropagation
Calculating the Weight and Bias Derivatives
Stochastic Gradient Descent
Using scikit-learn
Limitations of Neural Networks and Deep Learning
Conclusion
Exercise
Chapter 8. Career Advice and the Path Forward
Redefining Data Science
A Brief History of Data Science
Finding Your Edge
SQL Proficiency
Programming Proficiency
Data Visualization
Knowing Your Industry
Productive Learning
Practitioner Versus Advisor
What to Watch Out For in Data Science Jobs
Role Definition
Organizational Focus and Buy-In
Adequate Resources
Reasonable Objectives
Competing with Existing Systems
A Role Is Not What You Expected
Does Your Dream Job Not Exist?
Where Do I Go Now?
Conclusion
Appendix A. Supplemental Topics
Using LaTeX Rendering with SymPy
Binomial Distribution from Scratch
Beta Distribution from Scratch
Deriving Bayes’ Theorem
CDF and Inverse CDF from Scratch
Use e to Predict Event Probability Over Time
Hill Climbing and Linear Regression
Hill Climbing and Logistic Regression
A Brief Intro to Linear Programming
MNIST Classifier Using scikit-learn
Appendix B. Exercise Answers
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Index
About the Author
Colophon