This book constitutes the revised selected papers of the 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, held in Oak Ridge, TN, USA*, in August 2020. The 36 full papers and 1 short paper presented were carefully reviewed and selected from a total of 94 submissions. The papers are organized in topical sections of computational applications: converged HPC and artificial intelligence; system software: data infrastructure and life cycle; experimental/observational applications: use cases that drive requirements for AI and HPC convergence; deploying computation: on the road to a converged ecosystem; scientific data challenges. *The conference was held virtually due to the COVID-19 pandemic.
Author(s): Jeffrey Nichols; Becky Verastegui; Arthur ‘Barney’ Maccabe; Oscar Hernandez; Suzanne Parete-Koon; Theresa Ahearn
Series: Communications in Computer and Information Science, 1315
Publisher: Springer
Year: 2021
Language: English
Pages: 564
Preface
Organization
Contents
Computational Applications: Converged HPC and Artificial Intelligence
Improving Seismic Wave Simulation and Inversion Using Deep Learning
1 Introduction
2 Wave Equations and RNN
2.1 Wave Equations
2.2 Recurrent Neural Network
2.3 PyTorch RNN Implementation
2.4 Seismic Wave Simulation
3 Differentiable Programming
3.1 Automatic Differentiation and Adjoint-State Method
3.2 Extended Automatic Differentiation
4 Seismic Inversion
4.1 Seismic Inversion
4.2 AutoEncoder for Dimensionality Reduction
4.3 Results
5 Discussion
6 Conclusion and Future Work
References
Large-Scale Neural Solvers for Partial Differential Equations
1 Introduction
2 Related Works
3 Methods
3.1 Physics-Informed Quantum Harmonic Oscillator
3.2 GatedPINN
4 Results
4.1 Approximation Quality
4.2 Domain Decomposition
4.3 Scalability and Power Draw
4.4 Discussion
5 Conclusion
References
Integrating Deep Learning in Domain Sciences at Exascale
1 Background
2 Deep Learning Software on Modern HPC Systems
2.1 Towards a Deep Learning Framework for HPC
2.2 Workflow Software for Modern HPC Systems
3 Algorithmic Improvements for DNN AI in HPC
3.1 Asynchronous Methods
3.2 Reduced and Mixed Precision
4 Applications
4.1 Materials Science and Microscopy
4.2 Super-Resolution for HPC Simulations
5 Meeting Exascale
6 Conclusion
References
Improving the Performancepg of the GMRES Method Using Mixed-Precision Techniques
1 Introduction
2 Numerics of Mixed Precision GMRES
3 Restart Strategies
4 Experimental Results
4.1 Measurement of the Rate of Convergence
4.2 Performance
5 Conclusion
References
On the Use of BLAS Libraries in Modern Scientific Codes at Scale
1 Introduction
2 Related Work
3 Methodology
3.1 Nektar++
3.2 QuantumESPRESSO
3.3 CASTEP
3.4 CP2K
3.5 LAMMPS
3.6 AlexNet
3.7 Library Tracing Tools
4 Results
4.1 Interpreting Matrix Distribution Figures Using HPLinpack
4.2 Nektar++
4.3 QuantumESPRESSO
4.4 CASTEP
4.5 CP2K
4.6 AlexNet
5 Conclusion
References
System Software: Data Infrastructure and Life Cycle
A Systemic Approach to Facilitating Reproducibility via Federated, End-to-End Data Management
1 Introduction
2 Systemic Approach to Reproducibility
2.1 Development and Deployment
3 Data Ingest
4 Data Management
4.1 DataFed Overview
4.2 FAIR Compliance
4.3 Data Organization, Sharing, and Dissemination
5 Data Analytics
6 Scientific Applications
6.1 Modelling and Simulations
6.2 Observations and Experiments
6.3 Data Analytics
7 Conclusions
References
Fulfilling the Promises of Lossy Compression for Scientific Applications
1 Promises of Lossy Compression for Scientific Data
2 Understanding the Effect of Lossy Compression on Scientific Data
2.1 Methodologies, Tools and Benchmarks
2.2 Understanding and Mitigating Lossy Compression Error Effects on Applications
3 Sophisticated Error Controls to Preserve Derived Quantities and Features
4 Customizable Compression Frameworks
5 Conclusion
References
DataStates: Towards Lightweight Data Models for Deep Learning
1 Introduction
2 Background
3 DataStates: An Overview
4 Related Work and Positioning
5 Conclusions
References
Scalable Data-Intensive Geocomputation: A Design for Real-Time Continental Flood Inundation Mapping
1 Introduction
2 A Geocomputation Use Case
3 Data and Computing Challenges
4 Data-Driven Geocomputation on HDA+HPC
5 Preliminary Results
6 Concluding Discussion
References
Enabling Scientific Discovery at Next-Generation Light Sources with Advanced AI and HPC
1 Introduction
2 Scale of the Challenge
3 A Transformative Data Architecture
4 The Role of AI/ML
5 First Steps
6 Future Directions
References
Visualization as a Service for Scientific Data
1 Introduction
2 Motivating Workflows
2.1 Fusion Simulation Workflow
2.2 KSTAR
3 On the Shoulders of Giants
3.1 Tier 1 Related Works
3.2 Tier 2 Related Works
4 Visualization as a Service Abstractions
4.1 Visualization as a Service Abstractions
4.2 Tier 1 Abstractions
4.3 Tier 2 Abstractions
5 Connecting Abstractions to Applications
6 Conclusion and Vision for the Future
References
Performance Improvements on SNS and HFIR Instrument Data Reduction Workflows Using Mantid
1 Introduction
2 Neutrons Data at ORNL Facilities
2.1 The NeXus Format
2.2 Mantid Processing of NeXus Datasets
3 Short-Term Performance Improvements
4 Long-Term View: NCIO
4.1 The NCIO Framework
4.2 NCIO Risks
5 Conclusions
References
Experimental/Observational Applications: Use Cases That Drive Requirements for AI and HPC Convergence
Software Framework for Federated Science Instruments
1 Introduction
2 Science Use-Case
3 Framework Design
3.1 Overview
3.2 Roles
3.3 Software Architecture
4 Virtual Beamlines
4.1 EPICS
4.2 FedScI EPICS Bridge
5 Conclusion
References
Automated Integration of Continental-Scale Observations in Near-Real Time for Simulation and Analysis of Biosphere–Atmosphere Interactions
1 Introduction
1.1 Improving Scientific Understanding Through Data-Model Integration
1.2 Earth System Models and Benchmarking
1.3 Network-Scale Observations
2 Visions to Improve Model Performance with Network-Scale Observations
2.1 Scale-Aware Observational Data Products for ESM Evaluation
2.2 Near-Real Time Data Accessibility for ESM and Benchmarking
3 Roadmap to Scientific Understanding
References
Toward Real-Time Analysis of Synchrotron Micro-Tomography Data: Accelerating Experimental Workflows with AI and HPC
1 Introduction
2 Data Acquisition
3 Summary of Computational Stages
3.1 Tomographic Reconstruction
3.2 Conventional Image Processing
3.3 Denoising with Deep Learning
4 Performance Benchmarking
5 Summary
References
Unsupervised Anomaly Detection in Daily WAN Traffic Patterns
1 Introduction
2 Related Work
3 Key Points and Motivation
3.1 Assumptions
3.2 Intuition Behind Our Methods
3.3 Unsupervised Clustering Algorithms
4 Methodology
4.1 Trace Collection: Building Streaming Data Pipelines
4.2 Offline Learning in Classifiers
4.3 Online Anomaly Finding
5 Preliminary Analysis
6 Experimental Results and Discussions
6.1 Silhouette Analysis for Optimal Clustering
6.2 Clustering Weekdays and Weekends in Training Data
6.3 Identifying Outliers in Test Data
6.4 Impact of Selected Feature Discretization Using Domain Knowledge
7 Conclusions
References
From Smart Homes to Smart Laboratories: Connected Instruments for Materials Science
1 Introduction
2 The Experiment Life Cycle: An Example from Electrochemistry
3 A Data Infrastructure to Map the Scientific Method
3.1 Capturing Sample Provenance and Custody Chain
3.2 Complete Recording of Experimental Data and Processing
3.3 Semantic Processing to Capture Intent and Results to Inform Interpretation
4 AI-Enabled Smart Beamlines
4.1 Access to Flexible Workflows
4.2 Smart Laboratory as a Data Hub
5 Conclusion
References
Machine Learning for the Complex, Multi-scale Datasets in Fusion Energy
1 Introduction
2 ML/AI for Fusion Use Cases
2.1 Deep Neural Networks Architectures for Multi-scale Data
2.2 Working with Multi-modalities
2.3 Working with Small Labelled Training Sets
2.4 Working with Streaming Data
3 Working with Simulations
4 Conclusion
References
Data Federation Challenges in Remote Near-Real-Time Fusion Experiment Data Processing
1 Introduction
1.1 Related Work
2 Remote Fusion Experiment
2.1 KSTAR Fusion Experiment and Workflows
2.2 NSTX-U Fusion Experiment and Workflows
3 Delta: Supporting Federated Data Today
3.1 ECEI Analysis with Delta
3.2 Adaptable Data Transfers Using Data Compression and Filtering
3.3 Remote Data Federation Services with ADIOS
4 Toward Plasma Science of the Future
5 Conclusion
References
Deploying Computation: On the Road to a Converged Ecosystem
Software Defined Infrastructure for Operational Numerical Weather Prediction
1 Introduction
2 Background
2.1 HPC in Operational Workflows for NWP
2.2 Convergence of Cloud and High Performance Computing
3 Implementation Details
3.1 Functional Specifications-System Architecture
3.2 Operational Specifications-Software Defined Infrastructure
3.3 COSMO Application Development
4 Results
4.1 Resiliency Expectations
4.2 Performance Expectations
5 Future Work
References
OpenSHMEM I/O Extensions for Fine-Grained Access to Persistent Memory Storage
1 Introduction
2 Background
3 Design
3.1 Client-Side Interface
3.2 Server Daemon
3.3 Server Subspaces
3.4 Client-Server Mechanisms for Remote Access of Fspace Data
3.5 Software Implementation Details
4 Implementation Results
4.1 Graph Update Workflow Benchmark
4.2 Benchmarking Requirements and Baseline for Data Persistence
4.3 Performance Evaluation
5 Conclusions
References
Distributed Transaction and Self-healing System of DAOS
1 DAOS Introduction
1.1 DAOS System Architecture
1.2 Data Protection and Distributed I/O
1.3 Algorithmic Object Placement and Redundancy Group
1.4 Self-healing System
2 Distributed Transaction of DAOS
2.1 Two-Phase Commit
2.2 Asynchronous Two-Phase Commit and Batch Commit
2.3 Read Protocol
2.4 Transaction Conflict
2.5 Non-blocking Two-Phase Commit and Transaction Resync
2.6 Transaction Coordinator Selection and Transaction Resync
3 Self-healing System of DAOS
3.1 Health Monitoring System
3.2 Rebuild Protocol
3.3 Cascading Failure Rebuild
4 Asynchronous 2-Phase Commit Performance Results
5 Conclusion
6 Future Work
References
Truly Heterogeneous HPC: Co-design to Achieve What Science Needs from HPC
1 Overview
2 Algorithmic Approach
2.1 Deep Graph Decomposition
2.2 Neuromorphic Scaling of 3D Convolutional Neural Networks
3 Hardware Architecture
3.1 Analog Neuromorphic Computing
3.2 Digital Neuromorphic Computing
3.3 Integrating Neuromorphic Computing with Conventional HPC: Optimizing System Architecture
3.4 Novel Approaches in Fabrication
4 Co-Design of Heterogeneous Architectures
4.1 Analytical Modeling
4.2 Joint Neural Hardware and Architecture Search
4.3 Learning Algorithms for Neuromorphic Hardware
5 Future of HPC: Truly Heterogeneous Architectures
References
Performance Evaluation of Python Based Data Analytics Frameworks in Summit: Early Experiences
1 Introduction
2 Technical Overview
2.1 OLCF Summit
2.2 NVIDIA RAPIDS
3 Performance Evaluation
3.1 cuDF
3.2 cuML
3.3 cuGraph
3.4 CuPy
4 Conclusions
References
Navigating the Road to Successfully Manage a Large-Scale Research and Development Project: The Exascale Computing Project (ECP) Experience
1 Introduction
2 Background
3 Implementing a Hybrid Approach in an Earned Value Environment
4 Case Study: Implementing a Hybrid Approach for ECP
4.1 Application Development (AD)
4.2 Software Technology (ST)
4.3 Hardware and Integration (HI)
4.4 Assessing Performance Measurement
5 Tools
6 Related Work
7 Conclusion
References
Memory vs. Storage Software and Hardware: The Shifting Landscape
1 Introduction
2 The Shifting Memory Landscape
2.1 How is ``Memory'' Used?
3 The Shifting Storage Landscape
3.1 How is ``Storage'' Used?
4 Persistent Memory Characteristics
4.1 Considering PMEM (Scale-Up)
5 The Hybrid Machine
5.1 Consider Persistent Memory
5.2 Pop Quiz! Memory or Storage?
5.3 What Do Apps People See?
5.4 What Do Storage People See?
5.5 Who Wins?
6 Memory vs. Storage: Bottom Line
6.1 What Do We Need to Do?
6.2 Why Do We Care?
References
ALAMO: Autonomous Lightweight Allocation, Management, and Optimization
1 Introduction
2 Autonomous Operating System Design
2.1 Lightweight Node OS
2.2 Global OS
2.3 Resource Management Usability
3 Autonomous Allocation of Lightweight Threads
4 Autonomous Allocation of Network Resources
5 Autonomous Allocation of Storage Resources
6 Autonomous Allocation of Power and Energy
7 Autonomous Management of Resilience
7.1 Failure Prediction
7.2 Scheduling Resilience Activities
7.3 Heterogeneous Architectures
7.4 Programming Models
8 Conclusion
References
Scientific Data Challenges
Smoky Mountain Data Challenge 2020: An Open Call to Solve Data Problems in the Areas of Neutron Science, Material Science, Urban Modeling and Dynamics, Geophysics, and Biomedical Informatics
1 Introduction
2 Challenge 1: Understanding Rapid Cycling Temperature Logs from the Vulcan Diffractometer
2.1 Background
2.2 Dataset
2.3 Challenges of Interest
3 Challenge 2: Towards a Universal Classifier for Crystallographic Space Groups
3.1 Background
3.2 Dataset
3.3 Challenges of Interest
4 Challenge 3: Impacts of Urban Weather on Building Energy Use
4.1 Background
4.2 Dataset
4.3 Challenges of Interest
5 Challenge 4: Computational Urban Data Analytics
5.1 Background
5.2 Dataset
5.3 Challenges of Interest
6 Challenge 5: Using Machine Learning to Understand Uncertainty in Subsurface Exploration
6.1 Background
6.2 Challenges of Interest
7 Challenge 6: Using Artificial Intelligence Techniques to Match Patients with Their Best Clinical Trial Options
7.1 Background
7.2 Dataset
7.3 Challenges of Interest
8 Challenge 7: The Kaggle CORD-19 Data Challenge
8.1 Background
8.2 Dataset
8.3 Challenges of Interest
9 Conclusion
References
Examining and Presenting Cycles in Temperature Logs from the Vulcan Diffractometer
1 Introduction
2 Tools
2.1 Software
2.2 Hardware
3 Data
4 Technical Approach
5 Results
6 Improvements
7 Conclusions
References
Probability Flow for Classifying Crystallographic Space Groups
1 Exploratory Data Analysis
1.1 Image Scaling Function
2 ML Algorithm for Space Group Classification
2.1 Transfer Learning
2.2 Tabula Rasa Learning
3 Overcoming Label Imbalance
3.1 Probability Flow
4 Conclusion
4.1 Future Directions
A Appendix
References
Towards a Universal Classifier for Crystallographic Space Groups: A Trickle-Down Approach to Handle Data Imbalance
1 Introduction
1.1 Problem Definition
1.2 Proposed Approaches
1.3 Outline of the Paper
2 Exploratory Data Analysis
2.1 Class Frequencies for All Non-Zero Classes
2.2 A Closer Inspection into Better Represented 20 Classes
3 Universal Function Approximator to Address Non-Geometric Mapping of CBED Images
3.1 Top Five-Class Classifiers
3.2 Top Ten-Class Classifiers
3.3 Top Twenty-Class Classifiers
3.4 Summary of the Classification Performances
4 Trickle-Down Classifier (TDC) to Mitigate Data Imbalance
5 Scaling Out the Classifiers and Hyper-Parameter Selection
6 Future Work
7 Conclusion
References
The Macro Impacts of Micro-Climates on the Energy Consumption of Urban Buildings
1 Introduction
2 Literature Review
3 Initial Data Collection and Processing
3.1 Data Visualization Pipeline
4 Research Questions
4.1 Are There Interesting Variations in the Weather and Building Energy Use Data for the Geographic Area?
4.2 Which Buildings in the Study Are Most Sensitive to Weather Effects?
4.3 How Can the Data Best Be Divided into Subsets for Meaningful Analysis and Visualization?
4.4 How Does Energy Use in Each Building Change Throughout the Year?
4.5 How Is Energy Use Different During the Coldest/Hottest Months as Compared to During Those of Less Extreme Temperature?
4.6 Are There Any Interesting Visualizations that Illustrate the Changing Dynamics of the Simulated Urban Environment?
5 Conclusion
References
A Framework for Linking Urban Traffic and Vehicle Emissions in Smart Cities
1 Introduction
2 Characterization of Original Data
2.1 Traffic Data
2.2 Emissions Data
2.3 Road Network
2.4 Building Footprints
3 Data Preparation
4 Methodology
4.1 Simplification of Building Data
4.2 Vehicle-Building Mapping
4.3 Dispersion of Traffic Emissions
4.4 Association of Emissions to Buildings
5 Results and Discussion
5.1 Qualitative Analysis of Emission Heatmaps
5.2 Quantitative Analysis of Per-Building Emission Concentrations
6 Conclusions
References
A Data-Integration Analysis on Road Emissions and Traffic Patterns
1 Introduction
2 Methodology
2.1 Challenge 1: Algorithms to Assign Vehicle Occupants to Buildings
2.2 Challenge 2: Vehicle Emissions and Correlation Analysis
2.3 Challenge 3: Traffic Patterns Characterization
3 Results
3.1 Challenge 1: Performance Comparison of NNS Algorithms
3.2 Challenge 2: Area-Wide Correlation Analysis of Vehicle Emissions
3.3 Challenge 3: Characterize Traffic Patterns
4 Conclusions
References
Data Analysis and Visualization of Traffic in Chicago with Size and Landuse-Aware Vehicle to Buildings Assignment
1 Introduction
2 Literature Review
2.1 Energy Consumption of Buildings
2.2 Urban Mobility
3 Datasets
4 Methodology
4.1 Vehicle to Buildings Assignment
4.2 Traffic Analysis
4.3 Emission Versus Traffic and Weather
5 Results
5.1 Vehicle to Buildings Assignment
5.2 Traffic, Emission and Weather Analysis
6 Conclusion and Future Work
References
Using Statistical Analysis and Computer Vision to Understand Uncertainty in Subsurface Exploration
1 Background
2 Related Work
3 Contributions
4 Methods
4.1 Dataset
4.2 Standard Deviation
4.3 Kullback-Leibler Divergence
4.4 Structural Similarity
4.5 Canny Filtering/Gather Image Quality
5 Results
5.1 Standard Deviation
5.2 K-L Divergence
5.3 Structural Similarity
5.4 Canny-Filtered Dataset
6 Discussion and Conclusions
References
The Heavy Lifting Treatment Helper (HeaLTH) Algorithm: Streamlining the Clinical Trial Selection Process
1 Introduction
2 Related Works
3 Methodologies
3.1 Data
3.2 Logical Comparison
3.3 Clustering
4 Results
5 Discussion
6 Conclusion
6.1 Limitations
6.2 Future Work
References
Author Index