An Introduction to Statistics with Python: With Applications in the Life Sciences

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Now in its second edition, this textbook provides an introduction to Python and its use for statistical data analysis. It covers common statistical tests for continuous, discrete and categorical data, as well as linear regression analysis and topics from survival analysis and Bayesian statistics.

For this new edition, the introductory chapters on Python, data input and visualization have been reworked and updated. The chapter on experimental design has been expanded, and programs for the determination of confidence intervals commonly used in quality control have been introduced. The book also features a new chapter on finding patterns in data, including time series. A new appendix describes useful programming tools, such as testing tools, code repositories, and GUIs.

The provided working code for Python solutions, together with easy-to-follow examples, will reinforce the reader’s immediate understanding of the topic. Accompanying data sets and Python programs are also available online. With recent advances in the Python ecosystem, Python has become a popular language for scientific computing, offering a powerful environment for statistical data analysis.

With examples drawn mainly from the life and medical sciences, this book is intended primarily for masters and PhD students. As it provides the required statistics background, the book can also be used by anyone who wants to perform a statistical data analysis. 


Author(s): Thomas Haslwanter
Series: Statistics and Computing
Edition: 2
Publisher: Springer
Year: 2022

Language: English
Pages: 340
City: Cham

Preface
Preface to the First Edition
Preface to the Second Edition
For Whom This Book Is
Acknowledgments
Contents
Abbreviations
Part I Python and Statistics
1 Introduction
1.1 Why Statistics?
1.2 Conventions
1.3 Accompanying Material
2 Python
2.1 Getting Started
2.1.1 Distributions and Packages
2.1.2 Installation of Python
2.1.3 Installation of R and rpy2
2.1.4 Python Resources
2.1.5 A Simple Python Program
2.2 Elements of Scientific Python Programming
2.2.1 Python Datatypes
2.2.2 Indexing and Slicing
2.2.3 Numpy Vectors and Arrays
2.2.4 pandas DataFrames
2.2.5 Functions, Modules, and Packages
2.3 Interactive Programming—IPython/Jupyter
2.3.1 Workflow
2.3.2 Jupyter Interfaces
2.3.3 Personalizing IPython/Jupyter
2.3.4 Sample Interactive Session
2.3.5 Converting Interactive Commands into a Python Program
2.4 Statistics Packages for Python
2.4.1 Seaborn—Data Visualization
2.4.2 Pingouin
2.4.3 Statsmodels—Tools for Statistical Modeling
2.5 Programming Tips
2.5.1 General Programming Tips
2.5.2 Python Tips
2.5.3 IPython/Jupyter Tips
2.6 Exercises
3 Data Input
3.1 Text
3.1.1 Visual Inspection
3.1.2 Reading ASCII-Data
3.1.3 Regular Expressions
3.2 Excel
3.3 Matlab
3.4 Binary Data: NPZ Format
3.5 Other Formats
3.6 Exercises
4 Data Display
4.1 Introductory Example
4.2 Plotting in Python
4.2.1 Functional and Object-Oriented Approaches
4.2.2 Interactive Plots
4.3 Saving a Figure
4.4 Preparing Figures for Presentation
4.4.1 General Considerations
4.4.2 Modifying SVG Figures
4.5 Display of Statistical Data Sets
4.5.1 Plots of Data with One Variable
4.5.2 Plots of Data with Two or More Variables
4.6 Exercises
Part II Distributions and Hypothesis Tests
5 Basic Statistical Concepts
5.1 Populations and Samples
5.2 Data Types
5.2.1 Categorical
5.2.2 Numerical
5.2.3 Data with One, Two, or More Variables
5.3 Probability Distributions
5.3.1 Definitions
5.3.2 Discrete Distributions
5.3.3 Continuous Distributions
5.3.4 Expected Value and Variance
5.4 Degrees of Freedom
5.5 Study Design
5.5.1 Terminology
5.5.2 Overview
5.5.3 Types of Studies
5.5.4 Design of Experiments
5.5.5 Recommendations for Researchers
5.5.6 Personal Advice
5.5.7 Good Study Design: Clinical Investigation Plan
6 Distributions of One Variable
6.1 Characterizing a Distribution
6.1.1 Distribution Center
6.1.2 Quantifying Variability
6.1.3 Parameters Describing the Form of a Distribution
6.1.4 Important Methods of Probability Density Functions
6.2 Discrete Distributions
6.2.1 Bernoulli Distribution
6.2.2 Binomial Distribution
6.2.3 Poisson Distribution
6.2.4 Hypergeometric Distribution
6.3 Normal Distribution
6.3.1 Examples of Normal Distributions
6.3.2 Central Limit Theorem
6.3.3 Distributions and Hypothesis Tests
6.4 Continuous Distributions Derived from the Normal Distribution
6.4.1 T-Distribution
6.4.2 Chi-Square Distribution
6.4.3 F-Distribution
6.5 Other Continuous Distributions
6.5.1 Lognormal Distribution
6.5.2 Weibull Distribution
6.5.3 Exponential Distribution
6.5.4 Uniform Distribution
6.6 Confidence Intervals of Selected Statistical Parameters
6.7 Exercises
7 Hypothesis Tests
7.1 Typical Analysis Procedure
7.1.1 Data Screening and Outliers
7.1.2 Normality Check
7.1.3 Transformation
7.2 Hypothesis Tests and Power Analyses
7.2.1 An Example
7.2.2 Generalization and Applications
7.2.3 The Interpretation of the P-Value
7.2.4 Types of Errors
7.2.5 Sample Size
7.3 Sensitivity and Specificity
7.3.1 Related Calculations
7.3.2 Example: Mammogram
7.4 Receiver-Operating-Characteristic (ROC) Curve
7.5 Exercises
8 Tests of Means of Numerical Data
8.1 Distribution of a Sample Mean
8.1.1 One Sample T-Test for a Mean Value
8.1.2 Wilcoxon Signed Rank Sum Test
8.2 Comparison of Two Groups
8.2.1 Paired T-Test
8.2.2 T-Test Between Independent Groups
8.2.3 T-Tests with Pingouin
8.2.4 Non-parametric Comparison of Two Groups: Mann-Whitney Test
8.3 Comparison of Multiple Groups
8.3.1 Analysis of Variance (ANOVA)
8.3.2 Multiple Comparisons
8.3.3 Kruskal–Wallis Test
8.3.4 Two-Way ANOVA
8.3.5 Three-Way ANOVA
8.3.6 Friedman Test
8.4 Summary: Selecting the Right Test for Comparing Groups
8.5 Exercises
9 Tests on Categorical Data
9.1 Proportions and Confidence Intervals
9.1.1 Explanation
9.1.2 Example
9.2 Tests Using Frequency Tables
9.2.1 One-Way Chi-Square Test
9.2.2 Chi-Square Contingency Test
9.2.3 Fisher's Exact Test
9.2.4 McNemar's Test
9.2.5 Cochran's Q Test
9.3 Exercises
10 Analysis of Survival Times
10.1 Survival Distributions
10.2 Survival Probabilities
10.2.1 Censorship
10.2.2 Kaplan–Meier Survival Curve
10.3 Comparing Survival Curves in Two Groups
Part III Statistical Modeling
11 Finding Patterns in Signals
11.1 Cross Correlation
11.2 Correlation Coefficient
11.2.1 Covariance
11.2.2 Pearson Correlation Coefficient
11.2.3 Rank Correlation
11.3 Coefficient of Determination
11.3.1 General Linear Regression Model
11.3.2 Interpretation
11.4 Scatterplot Matrix
11.5 Correlation Matrix
11.6 Autocorrelation
11.7 Time-Series Analysis
11.7.1 Data Decomposition
11.7.2 Analysis of Residuals
11.7.3 ARMA models
11.7.4 Integrated ARMA (or ARIMA) Models
11.7.5 Examples of Simple ARIMA Models
12 Linear Regression Models
12.1 Simple Fits
12.2 Design Matrix and Formulas
12.2.1 Example 1: Simple Linear Regression
12.2.2 Example 2: Quadratic Fit
12.2.3 Multilinear Regression
12.2.4 Patsy—The Formula Language
12.2.5 Design Matrix
12.3 Linear Regression Analysis with Python
12.3.1 Example 1: Line Fit with Confidence Intervals
12.3.2 Example 2: Noisy Quadratic Polynomial
12.4 Model Results of Linear Regression Models
12.4.1 Example: Tobacco and Alcohol in the UK
12.4.2 Model Characteristics
12.4.3 Model Coefficients and Their Interpretation
12.4.4 Analysis of Residuals
12.4.5 Comparison to Model With Outlier
12.4.6 Regression Using Sklearn
12.4.7 Conclusion
12.5 Assumptions and Interpretations of Linear Regression
12.5.1 Assumptions
12.5.2 Interpreting Multilinear Regression Models
12.6 Bootstrapping
12.7 Exercises
13 Generalized Linear Models
13.1 Comparing and Modeling Ranked Data
13.2 Elements of GLMs
13.2.1 Exponential Family of Distributions
13.2.2 Linear Predictor and Link Function
13.3 GLM 1: Logistic Regression
13.4 GLM 2: Ordinal Logistic Regression
13.4.1 Model
13.4.2 Optimization
13.4.3 Performance
13.5 Exercises
14 Bayesian Statistics
14.1 Bayesian Versus Frequentist Interpretation
14.1.1 Bayes' Theorem
14.1.2 Bayesian Example
14.2 The Bayesian Approach in the Age of Computers
14.3 Example: Markov-Chain-Monte-Carlo Simulation
14.4 Summing Up
Appendix A Useful Programming Tools
A.1 Debugger
A.2 Test Tools
A.3 Code Versioning with git
A.3.1 Overview
A.3.2 Installation and Interfaces
A.3.3 Examples
A.4 Graphical User Interfaces (GUIs)
A.4.1 PySimpleGUI—Examples
A.4.2 PyQtGraph
A.4.3 Tips for User Interface
A.5 Exercises
Appendix B Solutions
Appendix C Equations for Confidence Intervals
Appendix D Web Ressources
Appendix Glossary
Appendix Bibliography
Index