Soft Error Reliability Using Virtual Platforms : Early Evaluation of Multicore Systems

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book describes the benefits and drawbacks inherent in the use of virtual platforms (VPs) to perform fast and early soft error assessment of multicore systems. The authors show that VPs provide engineers with appropriate means to investigate new and more efficient fault injection and mitigation techniques. Coverage also includes the use of machine learning techniques (e.g., linear regression) to speed-up the soft error evaluation process by pinpointing parameters (e.g., architectural) with the most substantial impact on the software stack dependability. This book provides valuable information and insight through more than 3 million individual scenarios and 2 million simulation-hours. Further, this book explores machine learning techniques usage to navigate large fault injection datasets.

Author(s): Felipe Rocha da Rosa, Luciano Ost, Ricardo Reis
Publisher: Springer
Year: 2020

Language: English
Pages: 136
City: Cham

Preface
Contents
Acronyms
1 Introduction
1.1 Hypothesis to Be Demonstrated in This Book
1.2 Book Goal
1.3 Original Contributions of This Work
1.3.1 Early Soft Error Evaluation
1.3.2 Novel Non-intrusive Fault Injection Techniques
1.3.3 Instruction-Accurate Fault Injection Consistency
1.3.4 Extensive Investigation of the Software Stack Impact on the System Reliability
1.3.5 Correlating Soft Errors and Microarchitectural Data
1.4 Book Outline
2 Background on Soft Errors
2.1 Main Reliability Challenges in Electronic-Based Systems
2.2 Radiation-Induced Soft Errors
2.2.1 Radiation Source and Soft Errors Mechanisms
2.2.2 Fault Propagation and Masking
2.2.3 Soft Error Metrics
2.2.4 Soft Error Trends in Electronic Systems
2.3 Soft Error Assessment
3 Fault Injection Framework Using Virtual Platforms
3.1 Virtual Platforms
3.2 Related Work on Fault Injection Approaches using Virtual Platforms
3.3 Fault Modeling
3.4 Fault Injection Flow
3.5 OVPsim-FIM
3.6 gem5-FIM
3.7 Detailed Fault Injection Using Instruction-Accurate Virtual Platforms
3.7.1 SOFIA
3.7.1.1 Application Virtual Memory
3.7.1.2 Application Variables and Data Structures
3.7.1.3 Function Code
3.7.1.4 Function Lifespan
3.7.1.5 Fault Inspection
3.8 Improving Fault Injection Campaigns Performance
3.8.1 Shared Memory Multicore Parallelization
3.8.2 Checkpoint and Restore Technique
3.8.3 Distribute Fault Injection Campaigns Using HPCs
3.9 Other Extensions
3.9.1 Targeting Complex SW Stacks
3.9.2 Injecting Faults on Multicore Systems
3.9.3 ARMv8 Architecture Extension
3.10 Closing Remarks
4 Performance and Accuracy Assessment of Fault Injection Frameworks Based on VPs
4.1 Experimental Setup
4.2 Performance and Accuracy Evaluation of Instruction-Accurate Virtual Platforms
4.2.1 Accuracy
4.2.2 Instruction-Accurate Simulation Engine Parameters Impact on Soft Error Assessment
4.2.2.1 Quantum-Leap Impact on Soft Error Assessment
4.2.2.2 Mismatch Considering the Quantum Size
4.2.3 Performance and Speedup
4.3 Closing Remarks
5 Extensive Soft Error Evaluation
5.1 Soft Error Evaluation Considering Multicore Design Metrics/Decisions
5.1.1 ISA Reliability Assessment
5.1.1.1 Execution Time and Workload
5.1.1.2 Register File Size
5.1.1.3 Branches and Function Calls
5.1.1.4 Memory Transactions
5.1.2 Parallelization API
5.1.2.1 Serial vs. APIs
5.1.2.2 Vulnerability Window
5.2 Focused Fault Injection Results
5.2.1 Case Study
5.2.1.1 Sequential and Parallel MM
5.2.1.2 Triple Modular Redundancy
5.2.1.3 Improving the Triple Modular Redundancy
5.3 Closing Remarks
6 Machine Learning Applied to Soft Error Assessment in Multicore Systems
6.1 Machine Learning
6.2 Machine Learning Applied to System Reliability
6.3 Problem Description
6.4 Proposed Solution
6.5 The Promoted ML Investigation Tool
6.5.1 Feature Acquisition
6.5.2 Feature Transformation
6.5.2.1 Rescaling
6.5.2.2 Normalization
6.5.2.3 Merging Similar Features
6.5.2.4 Feature Combination
6.5.3 Feature Selection
6.5.3.1 Variance Threshold
6.5.3.2 Principal Component Analysis
6.5.3.3 Linear Regression
6.5.3.4 Correlation Coefficient
6.5.3.5 Recursive Feature Elimination
6.5.3.6 Euclidean Distance
6.5.3.7 Soft Error Score
6.6 Exploration Flow
6.6.1 Phase 1: Feature Acquisition and Data Homogenization
6.6.2 Phase 2: Unidimensional Feature Transformation and Selection
6.6.3 Phase 3: Multidimensional Feature Transformation and Selection
6.7 Results
6.7.1 Training Set Selection and Bias
6.7.2 Characterization
6.7.3 Branches and Function Calls
6.7.4 Memory Transactions
6.8 Case Study
6.9 Closing Remarks
References
Index