Applied statistical modeling and data analytics: a practical guide for the petroleum geosciences

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Srikanta Mishra, Akhil Datta-Gupta
Publisher: Elsevier
Year: 2018

Language: English
Pages: xi, 237

Cover
Half-Title Page
APPLIED STATISTICAL MODELING AND DATA ANALYTICS: A Practical Guide for the Petroleum Geosciences
Copyright
Dedication
Contents
Preface
Acknowledgments
1:
Basic Concepts
Background and Scope
What Is Statistics?
What Is Big Data Analytics?
Data Analysis Cycle
Some Applications in the Petroleum Geosciences
Data, Statistics, and Probability
Outcomes and Events
Probability
Conditional Probability and Bayes Rule
Random Variables
Discrete Case
Continuous Case
Indicator Transform
Summary
Exercises
References
2:
Exploratory Data Analysis
Univariate Data
Measures of Center
Measures of Spread
Measures of Asymmetry
Graphing Univariate Data
Bivariate Data
Covariance
Correlation and Rank Correlation
Graphing Bivariate Data
Multivariate Data
Summary
Exercises
References
3:
Distributions and Models Thereof
Empirical Distributions
Histogram
Quantile Plot
Parametric Models
Uniform Distribution
Triangular Distribution
Normal Distribution
Lognormal Distribution
Poisson Distribution
Exponential Distribution
Binomial Distribution
Weibull Distribution
Beta Distribution
Working With Normal and Log-Normal Distributions
Normal Distribution
Normal Score Transformation
Log-Normal Distribution
Fitting Distributions to Data
Probability Plots
Parameter Estimation Techniques
Linear Regression Analysis
Method of Moments
Nonlinear Least-Squares Analysis
Other Properties of Distributions and Their Evaluation
Central Limit Theorem and Confidence Limits
Bootstrap Sampling
Comparing Two Distributions
Q-Q Plot
Testing for Difference in Mean
Testing for Difference in Distributions
Other Methods for Comparing Distributions
Summary
Exercises
References
4:
Regression Modeling and Analysis
Introduction
Simple Linear Regression
Formulating and Solving the Linear Regression Problem
Evaluating the Linear Regression Model
Properties of the Regression Parameters and Confidence Limits
Estimating Confidence Intervals for the Mean Response and Forecast
An Illustrative Example of Linear Regression Modeling and Analysis
Multiple Regression
Formulating and Solving the Multiple Regression Model
Evaluating the Multiple Regression Model
How Many Terms in the Regression Model?
Analysis of Variance (ANOVA) Table
An Illustrative Example of Multiple Regression Modeling and Analysis
Nonparametric Transformation and Regression
Conditional Expectation and Scatterplot Smoothers
Generalized Additive Models
Response Transformation Models: ACE Algorithm and Its Variations
Data Correlation via Nonparametric Transformation
Field Application for Nonparametric Regression: The Salt Creek Data Set
Dataset Description
Variable Selection
Optimal Transformations and Optimal Correlation
Summary
Exercises
References
5:
Multivariate Data Analysis
Introduction
Principal Component Analysis
Computing the Principal Components
An Illustrative Example of the Principal Component Analysis
Cluster Analysis
k-Means Clustering
An Illustrative Example of k-Means Clustering
Hierarchical Clustering
An Illustrative Example of Hierarchical Clustering
Model-Based Clustering
Discriminant Analysis
An Illustrative Example of Discriminant Analysis
Field Application: The Salt Creek Data Set
Dataset Description
PCA
Cluster Analysis
Data Correlation and Prediction
Summary
Exercises
References
Further Reading
6:
Uncertainty Quantification
Introduction
Deterministic Versus Probabilistic Approach
Elements of a Systematic Framework
Role of Monte Carlo Simulation
Uncertainty Characterization
Screening for Key Uncertain Inputs
Fitting Distributions to Data
Maximum Entropy Distribution Selection
Generation of Subjective Probability Distributions
Problem of Scale
Uncertainty Propagation
Sampling Methods
Random Sampling
Latin Hypercube Sampling
Correlation Control in LHS
Computational Considerations
Number of Samples
Visualization of Results
Uncertainty Importance Assessment
Basic Concepts in Uncertainty Importance
Scatter Plots and Rank Correlation Analysis
Stepwise Regression and Partial Rank Correlation Analysis
Other Measures of Variable Importance
Entropy (Mutual Information) Analysis
Classification Tree Analysis
Moving Beyond Monte Carlo Simulation
First-Order Second-Moment Method (FOSM)
General Expressions for Mean and Variance
Error Analysis in Additive and Multiplicative Models
Point Estimate Method (PEM)
Logic Tree Analysis (LTA)
Treatment of Model Uncertainty
Basic Concepts
Moment-Matching Weighting Method for Geostatistical Models
Example Field Application
Elements of a Good Uncertainty Analysis Study
Summary
Exercises
References
7:
Experimental Design and Response Surface Analysis
General Concepts
Experimental Design
Factorial Designs
Plackett-Burman
Central Composite and Box-Behnken
Augmented Pairs
Comparison of Factorial Designs
Sampling Designs
Purely Random Design
Latin Hypercube Sampling
Maximin LHS
Maximum Entropy Design
Comparison of Sampling Designs
Metamodeling Techniques
Quadratic Model
Quadratic Model With LASSO Variable Selection
Kriging Model
Radial Basis Functions
Metamodel Performance Evaluation Metric
An Illustration of Experimental Design and Response Surface Modeling
Field Application of Experimental Design and Response Surface Modeling
Problem of Interest
Proxy Construction and Application Strategy
Field Case Study
Summary
Exercises
References
Further Reading
8: Data-Driven Modeling
Introduction
Preliminaries
Data-Driven Models-What and Why?
Our Philosophy
Modeling Approaches
Classification and Regression Trees
Random Forest
Gradient Boosting Machine
Support Vector Machine
Artificial Neural Network
Model Strengths and Weaknesses
Computational Considerations
Model Evaluation
Automatic Tuning of Model Parameters
Variable Importance
Model Aggregation
Field Example
Dataset Description
Predictive Model Building
Variable Importance and Conditional Sensitivity
Classification Tree Analysis
Summary
Exercises
References
9:
Concluding Remarks
The Path We Have Taken
Recapitulation of Topics
Style and Intended Use
Resources
Key Takeaways
Which Variables?
Simple Model, or Complex?
One Model, or Many?
Is Past Always Prolog?
To Fit, or Overfit?
Final Thoughts
References
Index
Back Cover