Essential Statistics for Data Science: A Concise Crash Course

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Essential Statistics for Data Science: A Concise Crash Course is for students entering a serious graduate program or advanced undergraduate teaching in data science without knowing enough statistics. The three part text introduces readers to the basics of probability and random variables and guides them towards relatively advanced topics in both frequentist and Bayesian in a matter of weeks.Part I, Talking Probability explains the statistical approach to analysing data with a probability model to describe the data generating process. Part II, Doing Statistics demonstrates how the unknown quantities in data i.e. it's parameters is applicable in statistical interference. Part III, Facing Uncertainty explains the importance of explicity describing how much uncertainty is caused by parameters with intrinsic scientific meaning and how to take that intoaccount when making decisions.Essential Statistics for Data Science: A Concise Crash Course provides an in-depth introduction for beginners, while being more focused than a typical undergraduate text, but still lighter and more accessible than an average graduate text. At the frontier of statistics, Data Science, or Machine Learning, the probability models used to describe the data-generating process can be pretty complex. Most of those which we will encounter in this book will, of course, be much simpler. However, whether the models are complex or simple, this particular characterization of what statistics is about is very important and also why, in order to study statistics at any reasonable depth, it is necessary to become reasonably proficient in the language of probability.

Author(s): Mu Zhu
Publisher: Oxford University Press
Year: 2023

Language: English
Pages: 177

cover
titlepage
copyright
dedication
Contents
Prologue
Part I. Talking Probability
1 Eminence of Models
Appendix 1.A For brave eyes only
2 Building Vocabulary
2.1 Probability
2.1.1 Basic rules
2.2 Conditional probability
2.2.1 Independence
2.2.2 Law of total probability
2.2.3 Bayes law
2.3 Random variables
2.3.1 Summation and integration
2.3.2 Expectations and variances
2.3.3 Two simple distributions
2.4 The bell curve
3 Gaining Fluency
3.1 Multiple random quantities
3.1.1 Higher-dimensional problems
3.2 Two ``hard'' problems
3.2.1 Functions of random variables
3.2.2 Compound distributions
Appendix 3.A Sums of independent random variables
3.A.1 Convolutions
3.A.2 Moment-generating functions
3.A.3 Formulae for expectations and variances
Part II. doing statistics
4 Overview of Statistics
4.1 Frequentist approach
4.1.1 Functions of random variables
4.2 Bayesian approach
4.2.1 Compound distributions
4.3 Two more distributions
4.3.1 Poisson distribution
4.3.2 Gamma distribution
Appendix 4.A Expectation and variance of the Poisson
Appendix 4.B Waiting time in Poisson process
5 Frequentist Approach
5.1 Maximum likelihood estimation
5.1.1 Random variables that are i.i.d.
5.1.2 Problems with covariates
5.2 Statistical properties of estimators
5.3 Some advanced techniques
5.3.1 EM algorithm
5.3.2 Latent variables
Appendix 5.A Finite mixture models
6 Bayesian Approach
6.1 Basics
6.2 Empirical Bayes
6.3 Hierarchical Bayes
Appendix 6.A General sampling algorithms
6.A.1 Metropolis algorithm
6.A.2 Some theory
6.A.3 Metropolis–Hastings algorithm
Part III. Facing uncertainty
7 Interval Estimation
7.1 Uncertainty quantification
7.1.1 Bayesian version
7.1.2 Frequentist version
7.2 Main difficulty
7.3 Two useful methods
7.3.1 Likelihood ratio
7.3.2 Bootstrap
8 Tests of Significance
8.1 Basics
8.1.1 Relation to interval estimation
8.1.2 The p-value
8.2 Some challenges
8.2.1 Multiple testing
8.2.2 Six degrees of separation
Appendix 8.A Intuition of Benjamini-Hockberg
Part IV. APPENDIX
Appendix: Some Further Topics
A.1 Graphical models
A.2 Regression models
A.3 Data collection
Epilogue
Bibliography
Index