DEPARTMENT OF COMPUTER SCIENCE
DOCTORAL DISSERTATION DEFENSE


Candidate: DAVID BACON
Advisor: JACK SCHWARTZ


SETL for Internet Data Processing


3:00 p.m., Thursday, December 9, 1999
12th floor conference room, 719 Broadway




Abstract

Although networks and coordinated processes figure prominently in the kinds of data manipulation found in everything from scientific modeling to large-scale data mining, programmers charged with setting up the requisite software systems frequently find themselves hampered by the inadequacy of available languages. The ``real'' languages such as C++ and Java tend to be low-level, requiring the specification of a great deal of often repetitive detail, whereas the higher-level ``scripting'' languages tend to lack the kinds of structuring facilities that lend themselves to the reliable construction of even modestly large systems.

The high-level language SETL meets both of these needs. Originally conceived as a language which aimed to bring programming a little closer to the idealized world of mathematics, making it extremely useful in the human-to-human communication of algorithms, SETL has proven itself over the years to be an excellent language for software prototyping, primarily because its conciseness and immediacy lend it well to rapid experimentation. These characteristics, together with its general freedom from machine-oriented restrictions, its value semantics, its comprehension-style constructors for aggregates, its skill with strings, and especially its syntactic support for mappings, also make it well suited to high-level data processing.

In order to play the role of a full-fledged modern data processing language, however, SETL had to acquire the ability to manipulate processes and communicate with them easily, and furthermore to be able to work with networks, particularly the client-server model that rules the Internet. Accordingly, I have integrated a full set of process and network management features into SETL. In my dissertation, I show how the liberal use of fullweight processes, with the high, protective walls that surround them, sustains a modular design approach which in turn provides a strong defense against the main hazards of distributed computing, namely race conditions and deadlock, while preserving the luxury and convenience of programming in a truly high-level language. To this end, I have evolved protocols and design patterns for developing multiplexing servers and clients in SETL, and in my talk, will present examples of fairly complex systems where hierarchies of processes communicate over the network. Such systems tend to be notorious for their unreliability, but in these instances, robustness seems to follow naturally from the readability of simple programs written in an ancient and friendly language.