This textbook introduces the reader to Machine Learning (ML) applications in Earth Sciences. In detail, it starts by describing the basics of machine learning and its potentials in Earth Sciences to solve geological problems. It describes the main Python tools devoted to ML, the typival workflow of ML applications in Earth Sciences, and proceeds with reporting how ML algorithms work. The book provides many examples of ML application to Earth Sciences problems in many fields, such as the clustering and dimensionality reduction in petro-volcanological studies, the clustering of multi-spectral data, well-log data facies classification, and machine learning regression in petrology. Also, the book introduces the basics of parallel computing and how to scale ML models in the cloud. The book is devoted to Earth Scientists, at any level, from students to academics and professionals.
Author(s): Maurizio Petrelli
Publisher: Springer
Year: 2023
Language: English
Pages: 225
Preface
Acknowledgments
Overview
Let Me Introduce Myself
Styling Conventions
Shared Code
Involvement and Collaborations
Contents
Part I Basic Concepts of Machine Learning for Earth Scientists
1 Introduction to Machine Learning
1.1 Machine Learning: Definitions and Terminology
1.2 The Learning Process
1.3 Supervised Learning
1.4 Unsupervised Learning
1.5 Semisupervised Learning
References
2 Setting Up Your Python Environments for Machine Learning
2.1 Python Modules for Machine Learning
2.2 A Local Python Environment for Machine Learning
2.3 ML Python Environments on Remote Linux Machines
2.4 Working with Your Remote Instance
2.5 Preparing Isolated Deep Learning Environments
2.6 Cloud-Based Machine Learning Environments
2.7 Speed Up Your ML Python Environment
References
3 Machine Learning Workflow
3.1 Machine Learning Step-by-Step
3.2 Get Your Data
3.3 Data Pre-processing
3.3.1 Data Inspection
3.3.2 Data Cleaning and Imputation
3.3.3 Encoding Categorical Features
3.3.4 Data Augmentation
3.3.5 Data Scaling and Transformation
3.3.6 Compositional Data Analysis (CoDA)
3.3.7 A Working Example of Data Pre-processing
3.4 Training a Model
3.5 Model Validation and Testing
3.5.1 Splitting the Investigated Data Set into Three Parts
3.5.2 Cross-Validation
3.5.3 Leave-One-Out Cross-Validation
3.5.4 Metrics
3.5.5 Overfitting and Underfitting
3.6 Model Deployment and Persistence
References
Part II Unsupervised Learning
4 Unsupervised Machine Learning Methods
4.1 Unsupervised Algorithms
4.2 Principal Component Analysis
4.3 Manifold Learning
4.3.1 Isometric Feature Mapping
4.3.2 Locally Linear Embedding
4.3.3 Laplacian Eigenmaps
4.3.4 Hessian Eigenmaps
4.4 Hierarchical Clustering
4.5 Density-Based Spatial Clustering of Applications with Noise
4.6 Mean Shift
4.7 K-Means
4.8 Spectral Clustering
4.9 Gaussian Mixture Models
References
5 Clustering and Dimensionality Reduction in Petrology
5.1 Unveil the Chemical Record of a Volcanic Eruption
5.2 Geological Setting
5.3 The Investigated Data Set
5.4 Data Pre-processing
5.4.1 Data Cleaning
5.4.2 Compositional Data Analysis (CoDA)
5.5 Clustering Analyses
5.6 Dimensionality Reduction
References
6 Clustering of Multi-Spectral Data
6.1 Spectral Data from Earth-Observing Satellites
6.2 Import Multi-Spectral Data to Python
6.3 Descriptive Statistics
6.4 Pre-processing and Clustering
References
Part III Supervised Learning
7 Supervised Machine Learning Methods
7.1 Supervised Algorithms
7.2 Naive Bayes
7.3 Quadratic and Linear Discriminant Analysis
7.4 Linear and Nonlinear Models
7.5 Loss Functions, Cost Functions, and Gradient Descent
7.6 Ridge Regression
7.7 Least Absolute Shrinkage and Selection Operator
7.8 Elastic Net
7.9 Support Vector Machines
7.10 Supervised Nearest Neighbors
7.11 Trees-Based Methods
References
8 Classification of Well Log Data Facies by Machine Learning
8.1 Motivation
8.2 Inspection of the Data Sets and Pre-processing
8.3 Model Selection and Training
8.4 Final Evaluation
References
9 Machine Learning Regression in Petrology
9.1 Motivation
9.2 LEPR Data Set and Data Pre-processing
9.3 Compositional Data Analysis
9.4 Model Training and Error Assessment
9.5 Evaluation of Results
References
Part IV Scaling Machine Learning Models
10 Parallel Computing and Scaling with Dask
10.1 Warming Up: Basic Definitions
10.2 Basics of Dask
10.3 Eager Computation Versus Lazy Evaluation
10.4 Diagnostic and Feedback
References
11 Scale Your Models in the Cloud
11.1 Scaling Your Environment in the Cloud
11.2 Scaling in the Cloud: The Hard Way
11.3 Scaling in the Cloud: The Easy Way
Reference
Part V Next Step: Deep Learning
12 Introduction to Deep Learning
12.1 What Does Deep Learning Mean?
12.2 PyTorch
12.3 PyTorch Tensors
12.4 Structuring a Feedforward Network in PyTorch
12.5 How to Train a Feedforward Network
12.5.1 The Universal Approximation Theorem
12.5.2 Loss Functions in PyTorch
12.5.3 The Back-Propagation and its Implementation in PyTorch
12.5.4 Optimization
12.5.5 Network Architectures
12.6 Example Application
References