Artificial Intelligence in Earth Science: Best Practices and Fundamental Challenges

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Artificial Intelligence in Earth Science: Best Practices and Fundamental Challenges provides a comprehensive, step-by-step guide to AI workflows for solving problems in Earth Science. The book focuses on the most challenging problems in applying AI in Earth system sciences, such as training data preparation, model selection, hyperparameter tuning, model structure optimization, spatiotemporal generalization, transforming model results into products, and explaining trained models. In addition, it provides full-stack workflow tutorials to help walk readers through the whole process, regardless of previous AI experience.

The book tackles the complexity of Earth system problems in AI engineering, fully guiding geoscientists who are planning to implement AI in their daily work.

Author(s): Ziheng Sun, Nicoleta Cristea, Pablo Rivas
Publisher: Elsevier
Year: 2023

Language: English
Pages: 428
City: Amsterdam

Front Cover
Artificial Intelligence in Earth Science: Best Practices and Fundamental Challenges
Copyright
Contents
Contributors
Chapter 1: Introduction of artificial intelligence in Earth sciences
1. Background and motivation
2. AI evolution in Earth sciences
3. Latest developments and challenges
4. Short-term and long-term expectations for AI
5. Future developments and how to adapt
6. Practical AI: From prototype to operation
7. Why do we write this book?
8. Learning goals and tasks
9. Assignments & open questions
References
Chapter 2: Machine learning for snow cover mapping
1. Introduction
2. Machine learning tools and model
2.1. What is ``scikit-learn´´
2.2. Why do we use random forest
2.3. Other supporting packages used in the chapter
3. Data preparation
4. Model parameter tuning
4.1. Number of samples
4.2. Number of features
4.3. Number of trees
4.4. Tree depth
5. Model training
5.1. Splitting data into training and testing subsets
5.2. Defining the random forest model
5.3. Feature importance
5.4. Save the model
6. Model performance evaluation
6.1. Testing subset model performance
6.2. Image-wide model performance
6.3. Model performance in open areas versus forested areas
7. Conclusion
8. Assignment
9. Open questions
References
Chapter 3: AI for sea ice forecasting
1. Introduction
1.1. Sea ice
1.2. Arctic Sea ice and global climate patterns
2. Sea ice seasonal forecast
3. Sea ice data exploration
3.1. Dataset description
4. ML approaches for sea ice forecasting
4.1. ML-based sea ice forecasting
4.1.1. Data preprocessing
4.1.2. Fitting the model
4.1.3. Model evaluation
4.2. Deep learning-based sea ice forecasting
4.2.1. Data preprocessing
4.2.2. Model training
4.2.3. Model evaluation
4.3. Ensemble learning-based sea ice forecasting
4.3.1. Data concatenation
4.3.2. Model evaluation
5. Results and analysis
6. Discussion
7. Open questions
8. Assignments
References
Chapter 4: Deep learning for ocean mesoscale eddy detection
1. Introduction
2. Chapter layout
3. Data preparation
3.1. AVISO-SSH data product
3.2. Training and testing sets
3.3. SSH map preprocessing
3.4. Generate ground truth eddy masks using the py-eddy-tracker algorithm
3.5. Use multiprocessing to generate segmentation masks in parallel
3.6. Take a subset of the masks and SSH map, and save to a compressed numpy (.npz) file
4. Training and evaluating an eddy detection model
4.1. Load data
4.1.1. Specify NPZ file paths
4.1.2. Load NPZ and convert into PyTorch DataLoader
4.1.3. Examine distribution of class frequencies to identify class imbalances
4.1.4. Example visualization
4.1.5. Example visualization (animate validation data)
4.2. Defining the training components
4.2.1. Segmentation model
4.2.2. Loss function, L(fθ(x),y)
4.2.3. Optimizer
4.2.4. One-cycle learning rate scheduler
4.3. Metrics
4.3.1. Precision and recall
4.3.2. Tensorboard logger (SummaryWriter)
4.4. Train the model
4.4.1. Define training loop
4.4.2. Analyze training curves in TensorBoard
4.4.3. Run the training loop for prescribed num_epochs
4.5. Evaluate model on training and validation sets
5. Discussion
6. Summary
7. Assignments
8. Open questions
Acknowledgments
References
Chapter 5: Artificial intelligence for plant disease recognition
1. Introduction
1.1. Plant disease challenge
1.2. Promising AI technique for plant disease detection and classification
2. Data retrieval and preparation
2.1. Data variability
2.2. Protocols for image capture
2.3. Image annotation
3. Step-by-step implementation
4. Experimental results and how to select a model
5. Discussion
6. Conclusion
7. Assignment
8. Open questions
References
Chapter 6: Spatiotemporal attention ConvLSTM networks for predicting and physically interpreting wildfire spread
1. Introduction
1.1. Technical contributions
2. Methodology
2.1. ConvLSTM network
2.2. Attention-based methods for ConvLSTM networks
3. Earth AI workflow
3.1. Dataset acquisition and preparation
3.1.1. Input-output sequence generation
3.1.2. Data normalization
3.2. Modeling workflow demonstration
3.2.1. Attention ConvLSTM networks architecture
Imports
Model configuration
Convolutional block attention module (CBAM)
Nonattention ConvLSTM block
CSA-ConvLSTM block
SCA-ConvLSTM block
Encoder-decoder block
Train and test network
3.2.2. Execute the model
3.3. Physical interpretability of the trained model: integrated gradients-based feature importance
4. Results
4.1. Prediction performance
4.2. Physical interpretation
5. Conclusions
6. Assignment
7. Open questions
References
Chapter 7: AI for physics-inspired hydrology modeling
1. Introduction and background
2. PyTorch and autodifferentiation
2.1. Getting started with PyTorch
2.2. Autodifferentiation theory
2.3. Practical use of autodifferentiation in PyTorch
3. Extremely brief background on numerical optimization
3.1. First-order methods: Gradient descent and other flavors for training neural networks
3.2. Second-order methods: Standards for numerical solutions to differential equations
3.3. Brief detour on numerically solving ODEs
The hydrologist's favorite: The linear reservoir model
4. Bringing things together: Solving ODEs inside of neural networks
The nonlinear reservoir model
Learning the reservoir conductivity function with neural networks
4.1. Split out the input/output data
4.1.1. Let us train!
4.2. What did the network actually learn though?
4.3. Introducing torchdiffeq
5. Scaling up to a conceptual hydrologic model
5.1. The system of equations
5.2. Data
5.3. The model training functions
5.4. Setting up our training/testing data
5.5. Defining the model setup
5.6. Training the model
5.7. Model analysis
6. Conclusions
6.1. Exercises
6.2. Open questions
References
Further reading
Chapter 8: Theory of spatiotemporal deep analogs and their application to solar forecasting
1. Introduction
1.1. A brief history of weather analogs
1.2. Machine learning and its integration with analog ensemble
1.3. What you will learn in this chapter
2. Research data
2.1. Surface radiation budget network
2.2. Numerical weather prediction models
3. Methodology
3.1. Analog forecasting
3.1.1. Quantification of similarity between weather patterns
3.1.2. Generation of future predictions
3.2. Analog ensemble and the spatial extension
3.3. Spatial-temporal similarity metric with machine learning
4. Results and discussion
4.1. Verification at a single location
4.2. Search space extension
4.3. Weather analog identification
4.4. Machine learning interpretability via attribution
5. Final remarks
6. Assignment
7. Open questions
Appendix A. Deep learning layers and operators
A.1. Convolution
A.2. Nonlinear activation
A.3. Pooling
A.4. Convolutional long short-term memory network
Appendix B. Verification of extended analog search with GFS
Appendix C. Weather analog identification under a high irradiance regime
Appendix D. Model attribution
References
Chapter 9: AI for improving ozone forecasting
1. Introduction
1.1. What you will learn in this chapter
1.2. Prerequisites
2. Background
3. Data collection
3.1. AirNow O3 concentration
3.1.1. TROPOMI O3
3.2. CMAQ simulation data
4. Dataset preparation
5. Machine learning
5.1. Extreme gradient boosting model
5.2. Accuracy assessment
5.3. Comparison with other ML models
6. ML workflow management
7. Discussion
7.1. Accuracy improvement
7.2. Stability and reliability
8. Conclusion
9. Assignment
10. Open questions
11. Lessons learned
References
Chapter 10: AI for monitoring power plant emissions from space
1. Introduction
1.1. What you will learn in this chapter
1.2. Credentials
1.3. Prerequisites
2. Background
3. Data collection
3.1. TROPOMI tropospheric NO2 data
3.2. MERRA-2 meteorology data
3.3. EPA eGRID data
3.4. MODIS MCD19A2 product
4. Preprocessing
4.1. TROPOMI NO2
4.2. MERRA-2
4.3. MCD19A2
4.4. Merging training data
5. Machine learning
5.1. Support vector regression (SVR)
5.2. Utility functions
6. Managing emission AI workflow in Geoweaver
7. Discussion
8. Summary
9. Assignment
10. Open questions
11. Lessons learned
References
Chapter 11: AI for shrubland identification and mapping
1. Introduction
2. What youll learn
3. Background
4. Prerequisites
5. Model building
5.1. Preprocessing
5.2. Model fitting
5.3. Model evaluation
6. Discussion
7. Summary
8. Assignment
9. Open questions
References
Chapter 12: Explainable AI for understanding ML-derived vegetation products
1. Introduction
2. Background
3. Prerequisites
4. Method & technique
4.1. Choosing a machine learning model
4.2. Explainable artificial intelligence (XAI)
4.3. Local and global interpretability
5. Experiment & results
5.1. ELI5
5.2. Implementation
5.2.1. Conclusion
5.3. SHAP
5.3.1. Implementation
5.3.2. Conclusion
5.4. Accumulated local effects (ALE)
5.4.1. Implementation
5.4.2. Conclusion
5.5. Anchor
5.5.1. Implementation
5.5.2. Conclusion
6. Summary
7. Assignment
8. Open questions
9. Lessons learned
Acknowledgments
References
Further reading
Chapter 13: Satellite image classification using quantum machine learning
1. Introduction
1.1. Machine learning
1.2. Quantum computer and informatics
1.3. Quantum machine learning
1.4. Remote sensing (RS) and land cover classification
1.5. Vegetation and nonvegetation cover
2. Data
2.1. Satellite data retrieval
2.2. Split images into batches for annotation
3. Applying QML on MODIS hyperspectral images
3.1. Quantum neural network
3.2. Land cover (binary) classification
3.3. Setup of TensorFlow, TensorFlow quantum, and Cirq
3.3.1. TensorFlow (TF)
3.3.2. TensorFlow quantum (TFQ)
3.3.3. Cirq
3.4. Setup
3.5. Loading and preprocessing data
3.6. Quantum circuit data encoding
3.7. Quantum neural network: Building and compiling the model
3.8. Training the QNN model
3.9. Classification performance
4. Conclusions
5. Assignments
6. Open questions
Acknowledgment
References
Chapter 14: Provenance in earth AI
1. Introduction
2. Overview of relevant concepts in provenance, XAI, and TAI
2.1. Guidelines for building trustworthy AI
2.2. Understanding explainable AI
2.3. Provenance and documentation
3. Need for provenance in earth AI
3.1. Use of AI in the earth science domain
3.2. Related work in provenance and earth science
4. Technical approaches
4.1. Metaclip (METAdata for CLImate products)
4.2. Kepler scientific workflow system
4.3. Geoweaver
5. Discussion
6. Conclusions
7. Assignment
8. Open questions
Acknowledgments
References
Chapter 15: AI ethics for earth sciences
1. Introduction
2. Prior work
3. Addressing ethical concerns during system design
4. Considerating algorithmic bias
5. Designing ethically driven automated systems
6. Assessing the impact of autonomous and intelligent systems on human well-being
7. Developing AI literacy, skills, and readiness
8. On documenting datasets for AI
9. On documenting AI models
10. Carbon emissions of earth AI models
11. Conclusions
12. Assignments
13. Open questions
References
Index
Back Cover