Reliable Computer Systems: Collected Papers of the Newcastle Reliability Project

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

A research project to investigate the design and construction of reliable computing systems was initiated by B. Randell at the University of Newcastle upon Tyne in 1972. In over ten years of research on system reliability, a substantial number of papers have been produced by the members of this project. These papers have appeared in a variety of journals and conference proceedings and it is hoped that this book will prove to be a convenient reference volume for research workers active in this important area. In selecting papers published by past and present members of this project, I have used the following criteria: a paper is selected if it is concerned with fault tolerance and is not a review paper and was published before 1983. I have used these criteria (with only one or two exceptions!) in order to present a collection of papers with a common theme and, at the same time, to limit the size of the book to a reasonable length. The papers have been grouped into seven chapters. The first chapter introduces fundamental concepts of fault tolerance and ends with the earliest Newcastle paper on reliability. The project perhaps became well known after the invention of recovery blocks - a simple yet effective means of incorporating fault tolerance in software. The second chapter contains papers on recovery blocks, starting with the paper which first introduced the concept.

Author(s): Santosh Kumar Shrivastava (eds.)
Series: Texts and Monographs in Computer Science
Publisher: Springer
Year: 1985

Language: English
Pages: 586
Tags: System Performance and Evaluation; Computer Communication Networks; Computer Hardware

Front Matter....Pages I-XII
Introduction....Pages 1-4
Introduction....Pages 5-5
Fault Tolerance Terminology Proposals....Pages 6-13
System Structure for Software Fault Tolerance....Pages 14-36
Operating Systems: The Problems of Performance and Reliability....Pages 37-50
Introduction....Pages 51-52
A Program Structure for Error Detection and Recovery....Pages 53-68
A Reconsideration of the Recovery Block Scheme....Pages 69-79
Recovery Blocks in Action: A System Supporting High Reliability....Pages 80-101
Sequential Pascal with Recovery Blocks....Pages 102-111
Fault-Tolerant Sequential Programming Using Recovery Blocks....Pages 112-114
A Recovery Cache for the PDP-11....Pages 115-125
Recovery and Crash Resistance in a Filing System....Pages 126-139
Introduction....Pages 141-142
Software Reliability: The Role of Programmed Exception Handling....Pages 143-153
Exception Handling and Software Fault Tolerance....Pages 154-172
Robust Data Types....Pages 173-207
Systematic Detection of Exception Occurrences....Pages 208-237
Safe Programming....Pages 238-245
Introduction....Pages 247-248
Process Structuring, Synchronization, and Recovery Using Atomic Actions....Pages 249-265
A Formal Model of Atomicity in Asynchronous Systems....Pages 266-297
Reliable Resource Allocation Between Unreliable Processes....Pages 298-321
Concurrent Pascal with Backward Error Recovery: Language Features and Examples....Pages 322-343
Concurrent Pascal with Backward Error Recovery: Implementation....Pages 344-357
A Framework for Software Fault Tolerance in Real-Time Systems....Pages 358-377
Introduction....Pages 379-380
A Model of Recoverability in Multilevel Systems....Pages 381-395
The Provision of Recoverable Interfaces....Pages 396-409
Structuring Distributed Systems for Recoverability and Crash Resistance....Pages 410-432
Introduction....Pages 433-434
State Restoration in Distributed Systems....Pages 435-447
Recovery Control of Communicating Processes in a Distributed System....Pages 448-484
A Dependency, Commitment and Recovery Model for Atomic Actions....Pages 485-497
Fail-Safe Extrema-Finding in a Circular Distributed System....Pages 498-503
The Design of a Reliable Remote Procedure Call Mechanism....Pages 504-517
Reliable Remote Calls for Distributed UNIX: An Implementation Study....Pages 518-531
The Newcastle Connection or UNIXes of the World Unite!....Pages 532-549
Recoverability Aspects of a Distributed File System....Pages 550-562
Fault Tolerance and System Structuring....Pages 563-580