This book comprises chapters authored by experts who are professors and researchers in internationally recognized universities and research institutions. The book presents the results of research and descriptions of real-world systems, services, and technologies. Reading this book, researchers, professional practitioners, and graduate students will gain a clear vision on the state of the art of the research and real-world practice on system dependability and analytics.
The book is published in honor of Professor Ravishankar K. Iyer, the George and Ann Fisher Distinguished Professor in the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign (UIUC), Urbana, Illinois. Professor Iyer is ACM Fellow, IEEE Fellow, AAAS Fellow, and served as Interim Vice Chancellor of UIUC for research during 2008–2011. The book contains chapters written by many of his former students.
Author(s): Long Wang, Karthik Pattabiraman, Catello Di Martino, Arjun Athreya, Saurabh Bagchi
Series: Springer Series in Reliability Engineering
Publisher: Springer
Year: 2022
Language: English
Pages: 428
City: Cham
Introduction
Contents
Software Dependability
Introduction: Software Dependability
Intelligent Software Engineering for Reliable Cloud Operations
1 Introduction
2 Anomaly Detection of Key Performance Indicators
2.1 Background
2.2 Preprocessing
2.3 Multivariate KPIs Interactions
2.4 Collaborative Machine for Anomaly Detection
2.5 Experiments
3 Service Dependency Mining for Failure Diagnosis
3.1 Background
3.2 Tracing Analysis
3.3 Intensity of Service Dependency
3.4 Dependency Strength Mining
3.5 Experiments
4 Incident Aggregation for Root Cause Analysis
4.1 Background
4.2 Root Cause Analysis of System Incident
4.3 Experimental Results on Root Cause Analysis
5 Conclusions
References
Data Analytics: Predicting Software Bugs in Industrial Products
1 Introduction
1.1 The Problem
1.2 Our Contributions
2 Review of ML Applications
3 Review of Software Testing
3.1 Dynamic Modules
3.2 The “fix on fix” Problem
3.3 Complexity Metrics
4 Case Study
4.1 Data Collection and Processing
4.2 Models
5 Learnings, Thoughts, Musings
References
From Dependability to Security—A Path in the Trustworthy Computing Research
1 About Trustworthiness
2 The Evolution of the Bit-Flip Adversary Model
2.1 Security Consequences Caused by Bit-Flips
2.2 Fault Injection as a Weapon
2.3 Software Memory Bugs as a Weapon
2.4 Rowhammer—A Bit-Flip Security Threat in DRAM
3 Formal Methods
3.1 Formal Methods for Browser Security
3.2 Formal Methods for Authentication Protocols
4 Distributed Consensus
5 Summary
References
Assessment of Security Defense of Native Programs Against Software Faults
1 Introduction
2 Related Work
3 Design
3.1 Software Fault Injection to Assess Fuzzing Coverage
3.2 Classification of Fuzzing Failure Types
3.3 Quantitative Evaluation
3.4 Framework
4 Optimization
5 Experimental Methodology
6 Result
6.1 Effectiveness of Fuzzing and Testing
6.2 Coverages of Error Detectors
6.3 Detection Latency of Injected Faults
6.4 Evaluation of Fault Selection Algorithm
7 Discussion
8 Conclusion
References
Multi-layered Monitoring for Virtual Machines
1 Motivation
2 Target System Model
3 Limitations of State-of-the-Art VM Monitoring
3.1 Polling-and-Scanning Monitoring Paradigm
3.2 Untrustworthy Input
3.3 Inflexible Monitor Placement
3.4 Incompatible Reliability and Security Monitoring
4 HyperTap: Virtual Machine Monitoring Using Hardware Architectural Invariants
4.1 Monitoring Principles
4.2 Framework and Implementation
4.3 Performance Evaluation
5 Hprobes: Dynamic Virtual Machine Monitoring Using Hypervisor Probes
5.1 Introduction
5.2 Design
5.3 Prototype Implementation
6 hShield: Monitoring Hypervisor Integrity
6.1 Introduction
6.2 Assumptions and Threat Model
6.3 hShield Approach Overview
6.4 Execution Hashing
6.5 hShield Architectural Design
7 Conclusion
7.1 Continuous Monitoring of Guest OS and Applications
7.2 Continuous Monitoring of Hypervisor
References
Security for Software on Tiny Devices
1 Introduction
1.1 Ravi’s Contributions on This Topic
1.2 Why can’t We “just” Adopt Defenses from the Server World to the Embedded World?
2 Background and Related Work
2.1 Embedded System Development
2.2 Threat Model
2.3 Lack of Defenses on Embedded Systems
3 Guided IoT Exploration
4 Runtime Enforcement Techniques
4.1 Task II.1: Automatic Least Privilege Separation
4.2 Task II.2: Enforcing Isolation Among Compartments
5 Evaluating Security
5.1 IoT Metrics
5.2 IoT Benchmarks
5.3 BenchIoT: Our Contribution
6 Conclusion
References
Large-Scale Systems and Data Analytics
Introduction: Large-Scale Systems and Data Analytics
1 Rise of Data Analytics for System Dependability
2 Preview of Articles in This Section
On the Reliability of Computing-in-Memory Accelerators for Deep Neural Networks
1 Introduction
2 Non-volatile Devices
2.1 RRAM
2.2 Spintronics Devices
2.3 FeFET
3 CiM DNN Accelerators
3.1 Computing-in-Memory
3.2 Crossbar-Based Vector-Matrix Multiplication Engine
3.3 General Architecture of nvCiM DNN Accelerators
4 Device and Circuit Non-idealities
4.1 Thermal Noise
4.2 Shot Noise
4.3 Random Telegraph Noise
4.4 Programming Errors
4.5 Endurance and Retention
5 Impact of Device Variation on DNN Acceleration
5.1 Model of Device Variation
5.2 Impact of Device Variation on DNN Outputs
6 Dealing with Device Non-idealities
6.1 Error Correction
6.2 Identifying Robust Neural Architectures
6.3 Training Robust DNNs
7 Conclusions
References
Providing Compliance in Critical Computing Systems
1 Introduction
2 Compliance in Critical Computing Systems
2.1 High-Level Studies on Compliance
2.2 Compliance Rules
2.3 Compliance Audit
3 Reference Architecture for Compliance Validation/Enforcement
4 Technologies and Practices for Compliance Validation/Enforcement
4.1 Compliance Text Analysis
4.2 Behavior Capturing and Analysis
4.3 Evidence Collection and Extraction
4.4 Compliance Validation
5 Conclusions
References
Application-Aware Reliability and Security: The Trusted Illiac Experience
1 Introduction
2 Background and Beginnings
3 Early Years (Detector Placement)
4 Middle Years (Detector Derivation)
5 Later Years (Detector Implementation and Validation)
6 Other Directions
7 Lessons Learned
8 Aftermath: How Trusted Illiac Shaped My Subsequent Research
References
Mining Dependability Properties from System Logs: What We Learned in the Last 40 Years
1 Introduction
2 Overview of Data Collection Tools and Products
3 Log Selection and Pre-processing
3.1 Log Pre-processing
3.2 Coalescence
4 Analysis and Relevant Applications
4.1 Error and Failure Classification
4.2 Dependability Modeling
4.3 Failure Correlation and Error Propagation Analysis
4.4 Improvement of the Logging Practice
4.5 Security Analysis
5 Conclusion and Final Remarks
References
Critical Infrastructure Protection: Where Convergence of Logical and Physical Security Technologies is a Must
1 Introduction, Problem Statement, and Contributions
2 Proposed Approach and Conceptual Architecture
3 Technology Pillars
4 Case Studies
4.1 Dam and Water Supply Network
4.2 Public Administration and eHealth
4.3 Sensitive Industrial Plant for Chemical Storage
5 Conclusion
References
Health Care and CPS
Introduction: Cyber Physical Systems and Healthcare Analytics
On Improving the Reliability of Power Grids for Multiple Power Line Outages and Anomaly Detection
1 Introduction
2 Problem Formulation
3 Algorithms Design
3.1 ReTAD Algorithm Design
3.2 LIS Algorithm Design
3.3 Optimal Strategy of PMUs Placement Algorithm Design
4 Numerical Experiment
4.1 Experimental Setup
4.2 Real-Time Anomaly Detection Experiments
4.3 LIS Algorithm Experiments
4.4 Optimal PMU Placement Experiments
5 Conclusions
References
Domain-Specific Security Approaches for Cyber-Physical Systems
1 What Are Cyber-Physical Systems?
2 Attack Surfaces of Cyber-Physical Systems
3 Challenges of Detecting Attacks in CPSs
3.1 Visibility of System Activities
3.2 Diagnosis
3.3 Real-Time Constraints
4 Attacks Detection in CPSs
4.1 Anomaly-Based Detection in CPSs
4.2 Misuse-Based Detection
4.3 Specification-Based Detection
5 Attack Recovery in CPSs
5.1 Attack Recovery in Cyber Domains
5.2 Attack Recovery in Physical Domains
6 Preemptive Protection
6.1 Preemptive Protection Against External Adversaries
6.2 Preemptive Protection Against Internal Adversaries
7 Security Issues Associated with Advanced Computing Technologies
8 Summary
References
Uniting Computational Science with Biomedicine: The NSF Center for Computational Biotechnology and Genomic Medicine (CCBGM)
Data-Driven Approaches to Selecting Samples for Training Neural Networks
1 Introduction
2 Case Study: Near-Miss Sampling for Optimum Model Training
2.1 The Near-Miss Principle
2.2 The Biomedical NLP Task: Adverse Drug Event and Indication Relation Extraction
2.3 Datasets
2.4 Near-Miss Sampling Applied to the Task
2.5 Model Trained (BERT—Bidirectional Encoder Representations from Transformers)
2.6 Experiments and Metrics
2.7 Results
3 Discussion
4 Learnings from the Case Study and Future Work
References
Classifying COVID-19 Variants Based on Genetic Sequences Using Deep Learning Models
1 COVID-19 and Genetic Data
2 COVID-19 Variants
2.1 B.1.1.214
2.2 B.1.1.519
2.3 B.1.160
2.4 B.1.177.21
2.5 B.1.177
2.6 B.1.1.7
2.7 B.1.1
2.8 B.1.221
2.9 B.1.243
2.10 B.1.258
2.11 B.1.2
2.12 B.1.351
2.13 B.1.427
2.14 B.1.429
2.15 B.1.526
2.16 B.1.596
2.17 B.1.617.2
2.18 B.1
2.19 D.2
2.20 P.1
3 Data and Methods
3.1 Model Architecture
4 Results and Discussion
5 Conclusion
6 Code
References
Twenty-First Century Cybernetics and Disorders of Brain and Mind
1 The Spatial Localization of Focal Epilepsy: Predicting Where Seizures Are Generated
2 The Temporal Forecasting of Seizures: Predicting When Seizures Will Occur
3 The Cognitive, Memory and Sleep Comorbidies of Epilepsy
4 Next Generation of Implantable Devices that Enable Sensing, Electrical Stimulation, Embedded Analytics and Their Integration with Local and Distributed Computing
References
Dependability Assessment
Introduction: Dependability Assessment
Effect of Epistemic Uncertainty in Markovian Reliability Models
1 Introduction
2 Epistemic Uncertainty Propagation
2.1 Uncertainty Representation
2.2 Performance Index with Uncertainty
3 Moment-Based Approach for Epistemic Uncertainty Propagation
3.1 Formulation
3.2 Reliability of a Single Component System
4 Epistemic Uncertainty in Markov Reliability Models
4.1 Markov Reward Model
4.2 Partial Derivatives of Performance Index
4.3 Example: Virtual Machine Model
5 Summary
References
System Dependability Assessment—Interplay Between Research and Practice
1 Introduction
2 Model-Based Dependability Assessment
2.1 Electricity Production and Distribution
2.2 Air Traffic Control
2.3 Aeronautics
3 Software Reliability
4 Dependability Assessment Based on Simulation and Fault injection
5 Online Error Detection and Diagnosis
6 Lessons Learned and Concluding Remarks
References
Assessing Dependability of Autonomous Vehicles
1 Introduction
2 Broader Research Impact
3 Empirical Assessment Using Production Systems and Field-Failure Datasets
3.1 Concept and Approach
3.2 Results
3.3 Novelty
4 Validation Using Fault Injection and Fuzzing
4.1 Concept and Approach
4.2 Results
4.3 Novelty
5 Conclusion and Future Work
References
Personal Reflections
Foreword: Computing and Genomics at Illinois
An Academic Life Begins and Continues at University of Illinois at Urbana-Champaign
Learning from Prof. Iyer