Mathematicians have skills that, if deepened in the right ways, would enable them to use data to answer questions important to them and others, and report those answers in compelling ways. Data science combines parts of mathematics, statistics, computer science. Gaining such power and the ability to teach has reinvigorated the careers of mathematicians. This handbook will assist mathematicians to better understand the opportunities presented by data science. As it applies to the curriculum, research, and career opportunities, data science is a fast-growing field. Contributors from both academics and industry present their views on these opportunities and how to advantage them.
Author(s): Nathan Carter
Series: Handbooks in Mathematics
Edition: 1
Publisher: CRC Press
Year: 2020
Language: English
Pages: 544
Tags: Linear Algebra, Statistics, Clustering,Operations Research, Machine Learning, Neural Networks
Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Foreword
1. Introduction
1.1 Who should read this book?
1.2 What is data science?
1.3 Is data science new?
1.4 What can I expect from this book?
1.5 What will this book expect from me?
2. Programming with Data
2.1 Introduction
2.2 The computing environment
2.2.1 Hardware
2.2.2 The command line
2.2.3 Programming languages
2.2.4 Integrated development environments (IDEs)
2.2.5 Notebooks
2.2.6 Version control
2.3 Best practices
2.3.1 Write readable code
2.3.2 Don't repeat yourself
2.3.3 Set seeds for random processes
2.3.4 Profile, benchmark, and optimize judiciously
2.3.5 Test your code
2.3.6 Don't rely on black boxes
2.4 Data-centric coding
2.4.1 Obtaining data
2.4.1.1 Files
2.4.1.2 The web
2.4.1.3 Databases
2.4.1.4 Other sources and concerns
2.4.2 Data structures
2.4.3 Cleaning data
2.4.3.1 Missing data
2.4.3.2 Data values
2.4.3.3 Outliers
2.4.3.4 Other issues
2.4.4 Exploratory data analysis (EDA)
2.5 Getting help
2.6 Conclusion
3. Linear Algebra
3.1 Data and matrices
3.1.1 Data, vectors, and matrices
3.1.2 Term-by-document matrices
3.1.3 Matrix storage and manipulation issues
3.2 Matrix decompositions
3.2.1 Matrix decompositions and data science
3.2.2 The LU decomposition
3.2.2.1 Gaussian elimination
3.2.2.2 The matrices L and U
3.2.2.3 Permuting rows
3.2.2.4 Computational notes
3.2.3 The Cholesky decomposition
3.2.4 Least-squares curve-fitting
3.2.5 Recommender systems and the QR decomposition
3.2.5.1 A motivating example
3.2.5.2 The QR decomposition
3.2.5.3 Applications of the QR decomposition
3.2.6 The singular value decomposition
3.2.6.1 SVD in our recommender system
3.2.6.2 Further reading on the SVD
3.3 Eigenvalues and eigenvectors
3.3.1 Eigenproblems
3.3.2 Finding eigenvalues
3.3.3 The power method
3.3.4 PageRank
3.4 Numerical computing
3.4.1 Floating point computing
3.4.2 Floating point arithmetic
3.4.3 Further reading
3.5 Projects
3.5.1 Creating a database
3.5.2 The QR decomposition and query-matching
3.5.3 The SVD and latent semantic indexing
3.5.4 Searching a web
4. Basic Statistics
4.1 Introduction
4.2 Exploratory data analysis and visualizations
4.2.1 Descriptive statistics
4.2.2 Sampling and bias
4.3 Modeling
4.3.1 Linear regression
4.3.2 Polynomial regression
4.3.3 Group-wise models and clustering
4.3.4 Probability models
4.3.5 Maximum likelihood estimation
4.4 Confidence intervals
4.4.1 The sampling distribution
4.4.2 Confidence intervals from the sampling distribution
4.4.3 Bootstrap resampling
4.5 Inference
4.5.1 Hypothesis testing
4.5.1.1 First example
4.5.1.2 General strategy for hypothesis testing
4.5.1.3 Inference to compare two populations
4.5.1.4 Other types of hypothesis tests
4.5.2 Randomization-based inference
4.5.3 Type I and Type II error
4.5.4 Power and effect size
4.5.5 The trouble with p-hacking
4.5.6 Bias and scope of inference
4.6 Advanced regression
4.6.1 Transformations
4.6.2 Outliers and high leverage points
4.6.3 Multiple regression, interaction
4.6.4 What to do when the regression assumptions fail
4.6.5 Indicator variables and ANOVA
4.7 The linear algebra approach to statistics
4.7.1 The general linear model
4.7.2 Ridge regression and penalized regression
4.7.3 Logistic regression
4.7.4 The generalized linear model
4.7.5 Categorical data analysis
4.8 Causality
4.8.1 Experimental design
4.8.2 Quasi-experiments
4.9 Bayesian statistics
4.9.1 Bayes' formula
4.9.2 Prior and posterior distributions
4.10 A word on curricula
4.10.1 Data wrangling
4.10.2 Cleaning data
4.11 Conclusion
4.12 Sample projects
5. Clustering
5.1 Introduction
5.1.1 What is clustering?
5.1.2 Example applications
5.1.3 Clustering observations
5.2 Visualization
5.3 Distances
5.4 Partitioning and the
5.4.1 The k-means algorithm
5.4.2 Issues with k-means
5.4.3 Example with wine data
5.4.4 Validation
5.4.5 Other partitioning algorithms
5.5 Hierarchical clustering
5.5.1 Linkages
5.5.2 Algorithm
5.5.3 Hierarchical simple example
5.5.4 Dendrograms and wine example
5.5.5 Other hierarchical algorithms
5.6 Case study
5.6.1 k-means results
5.6.2 Hierarchical results
5.6.3 Case study conclusions
5.7 Model-based methods
5.7.1 Model development
5.7.2 Model estimation
5.7.3 mclust and model selection
5.7.4 Example with wine data
5.7.5 Model-based versus k-means
5.8 Density-based methods
5.8.1 Example with iris data
5.9 Dealing with network data
5.9.1 Network clustering example
5.10 Challenges
5.10.1 Feature selection
5.10.2 Hierarchical clusters
5.10.3 Overlapping clusters, or fuzzy clustering
5.11 Exercises
6. Operations Research
6.1 History and background
6.1.1 How does OR connect to data science?
6.1.2 The OR process
6.1.3 Balance between efficiency and complexity
6.2 Optimization
6.2.1 Complexity-tractability trade-off
6.2.2 Linear optimization
6.2.2.1 Duality and optimality conditions
6.2.2.2 Extension to integer programming
6.2.3 Convex optimization
6.2.3.1 Duality and optimality conditions
6.2.4 Non-convex optimization
6.3 Simulation
6.3.1 Probability principles of simulation
6.3.2 Generating random variables
6.3.2.1 Simulation from a known distribution
6.3.2.2 Simulation from an empirical distribution: bootstrapping
6.3.2.3 Markov Chain Monte Carlo (MCMC) methods
6.3.3 Simulation techniques for statistical and machine learning model assessment
6.3.3.1 Bootstrapping confidence intervals
6.3.3.2 Cross-validation
6.3.4 Simulation techniques for prescriptive analytics
6.3.4.1 Discrete-event simulation
6.3.4.2 Agent-based modeling
6.3.4.3 Using these tools for prescriptive analytics
6.4 Stochastic optimization
6.4.1 Dynamic programming formulation
6.4.2 Solution techniques
6.5 Putting the methods to use: prescriptive analytics
6.5.1 Bike-sharing systems
6.5.2 A customer choice model for online retail
6.5.3 HIV treatment and prevention
6.6 Tools
6.6.1 Optimization solvers
6.6.2 Simulation software and packages
6.6.3 Stochastic optimization software and packages
6.7 Looking to the future
6.8 Projects
6.8.1 The vehicle routing problem
6.8.2 The unit commitment problem for power systems
6.8.3 Modeling project
6.8.4 Data project
7. Dimensionality Reduction
7.1 Introduction
7.2 The geometry of data and dimension
7.3 Principal Component Analysis
7.3.1 Derivation and properties
7.3.2 Connection to SVD
7.3.3 How PCA is used for dimension estimation and data reduction
7.3.4 Topological dimension
7.3.5 Multidimensional scaling
7.4 Good projections
7.5 Non-integer dimensions
7.5.1 Background on dynamical systems
7.5.2 Fractal dimension
7.5.3 The correlation dimension
7.5.4 Correlation dimension of the Lorenz attractor
7.6 Dimension reduction on the Grassmannian
7.7 Dimensionality reduction in the presence of symme-try
7.8 Category theory applied to data visualization
7.9 Other methods
7.9.1 Nonlinear Principal Component Analysis
7.9.2 Whitney's reduction network
7.9.3 The generalized singular value decomposition
7.9.4 False nearest neighbors
7.9.5 Additional methods
7.10 Interesting theorems on dimension
7.10.1 Whitney's theorem
7.10.2 Takens' theorem
7.10.3 Nash embedding theorems
7.10.4 Johnson-Lindenstrauss lemma
7.11 Conclusions
7.11.1 Summary and method of application
7.11.2 Suggested exercises
8. Machine Learning
8.1 Introduction
8.1.1 Core concepts of supervised learning
8.1.2 Types of supervised learning
8.2 Training dataset and test dataset
8.2.1 Constraints
8.2.2 Methods for data separation
8.3 Machine learning workflow
8.3.1 Step 1: obtaining the initial dataset
8.3.2 Step 2: preprocessing
8.3.2.1 Missing values and outliers
8.3.2.2 Feature engineering
8.3.3 Step 3: creating training and test datasets
8.3.4 Step 4: model creation
8.3.4.1 Scaling and normalization
8.3.4.2 Feature selection
8.3.5 Step 5: prediction and evaluation
8.3.6 Iterative model building
8.4 Implementing the ML workflow
8.4.1 Using scikit-learn
8.4.2 Transformer objects
8.5 Gradient descent
8.5.1 Loss functions
8.5.2 A powerful optimization tool
8.5.3 Application to regression
8.5.4 Support for regularization
8.6 Logistic regression
8.6.1 Logistic regression framework
8.6.2 Parameter estimation for logistic regression
8.6.3 Evaluating the performance of a classifier
8.7 Naïve Bayes classifier
8.7.1 Using Bayes' rule
8.7.1.1 Estimating the probabilities
8.7.1.2 Laplace smoothing
8.7.2 Health care example
8.8 Support vector machines
8.8.1 Linear SVMs in the case of linear separability
8.8.2 Linear SVMs without linear separability
8.8.3 Nonlinear SVMs
8.9 Decision trees
8.9.1 Classification trees
8.9.2 Regression decision trees
8.9.3 Pruning
8.10 Ensemble methods
8.10.1 Bagging
8.10.2 Random forests
8.10.3 Boosting
8.11 Next steps
9. Deep Learning
9.1 Introduction
9.1.1 Overview
9.1.2 History of neural networks
9.2 Multilayer perceptrons
9.2.1 Backpropagation
9.2.2 Neurons
9.2.3 Neural networks for classification
9.3 Training techniques
9.3.1 Initialization
9.3.2 Optimization algorithms
9.3.3 Dropout
9.3.4 Batch normalization
9.3.5 Weight regularization
9.3.6 Early stopping
9.4 Convolutional neural networks
9.4.1 Convnet layers
9.4.2 Convolutional architectures for ImageNet
9.5 Recurrent neural networks
9.5.1 LSTM cells
9.6 Transformers
9.6.1 Overview
9.6.2 Attention layers
9.6.3 Self-attention layers
9.6.4 Word order
9.6.5 Using transformers
9.7 Deep learning frameworks
9.7.1 Hardware acceleration
9.7.2 History of deep learning frameworks
9.7.3 TensorFlow with Keras
9.8 Open questions
9.9 Exercises and solutions
10. Topological Data Analysis
10.1 Introduction
10.2 Example applications
10.2.1 Image processing
10.2.2 Molecule configurations
10.2.3 Agent-based modeling
10.2.4 Dynamical systems
10.3 Topology
10.4 Simplicial complexes
10.5 Homology
10.5.1 Simplicial homology
10.5.2 Homology definitions
10.5.3 Homology example
10.5.4 Homology computation using linear algebra
10.6 Persistent homology
10.7 Sublevelset persistence
10.8 Software and exercises
10.9 References
10.10 Appendix: stability of persistent homology
10.10.1 Distances between datasets
10.10.2 Bottleneck distance and visualization
10.10.3 Stability results
Bibliography
Index