RESILIENT COMPUTATIONS IN THE
PRESENCE OF SLOW-DOWNS
2:00 p.m., Wednesday, August 28, 1991
rm. 613, Warren Weaver Hall
With the advent of low cost work stations, distributed systems are becoming increasingly attractive. However, as the number of components in the system increases so does the probability of some component failing. When system designers discuss fault-tolerance, they typically restrict themselves to the problem of handling fail-stop failures. This work proposes an enhanced failure model that allows processes to fail by either slowing down or stopping; slow processes may later speed up, continue to proceed slowly, or, eventually, stop. We call such failures slow-downs. The model does not assume the ability to distinguish among these possibilities, say, by using a timeout mechanism, nor does it assume that it is possible to kill a slow process.
This thesis presents several results in this context. We discuss how to execute transactions under the slow-down model when the correctness criteria is serializability. We then discuss how to transform a class of lock-based concurrent data structures into nonblocking data structures. Both results are developed in the context of a shared memory machine having an atomic compare&swap.
We conclude this thesis by giving algorithms that can be used to emulate a reliable shared memory with compare&swap on a message passing system prone to slow-downs.