next up previous

Calypso Version 1.0 For Linux
Manual

Last Revised: April 30, 1997 Copyright © 1997 New York University.

Calypso is being developed as part of the MILAN project,
a joint research effort between New York University and Arizona State University.


Contents


Introduction

Calypso Overview

Calypso is a software system that allows the user to tap into unused computer cycles by providing a parallel processing platform for a network of standard workstations or personal computers. Calypso has the following features.

Ease of programming. Since existing C or C++ programs can be adapted for Calypso, the programmer does not have to worry about learning a new programming language. In fact, the Calypso Source Language (CSL) is just C++ with four additional key words.

Separation of programming parallelism from execution parallelism. The parallelism of the program is determined by the programmer and is a separate entity from the parallelism of the execution environment, which is determined by the availability of workstations at the time of execution. With Calypso, the mapping between program parallelism and execution parallelism is transparent to the user.

Dynamic load balancing and fault tolerance. Calypso\ automatically distributes the work load depending upon the dynamics of the participating machines. Faster machines do not have to wait for slower machines to finish their ``work assignments''-- faster machines overtake the slower ones. In addition, Calypso executions are resilient to failures. All processes, other than a specific designated ``manager'' process, can fail at any point without affecting the correctness of the computation.

Manager and Worker Processes

A ``normal'' execution of a Calypso program consists of a ``central'' process, called the manager, and one or more worker processes, called workers. These processes can reside on a single machine or they can be distributed on a network.

The manager is responsible for the management of the computation. This manager is assigned the task of executing sequential sections of the code. The current Calypso version only allows one manager, and thus it does not tolerate the failure of this process.

The computation of parallel jobs is left to the workers. There can be zero or more workers present at any one time, but at least one active worker is needed for a parallel computation to proceed; otherwise the computation is suspended, until a worker appears.

In general, the number of workers and the resources they can devote to the parallel computations can dynamically change in a completely arbitrary manner. This dynamic behavior is completely transparent to the Calypso program. In fact, arbitrarily asynchronous behavior of workers, their failures due to process crash or preemption for other jobs, and inaccessibility due to network partitions are tolerated. Furthermore, workers can be added at any time to speed up an already executing system and increase the fault tolerance.

Arbitrary asynchronous behavior of the manager is tolerated too. Of course, if it is slow, the overall execution will suffer.

For the design and the implementation of the Calypso software system, and a complete description of manager and worker processes, see [2].

Hardware & Software Requirements

Hardware.

Any commercial off-the-shelf PC with an 80386, 80486, Pentium, Pentium Pro or compatible processor and a minimum of 8MB RAM (16MB is recommended) is suitable to run Calypso. To use several workstations, they must be connected as a network.

Software.

Calypso programs require the Linux operating system, a C++ compiler, the Extended Tcl/tk (TclX) scripting language, and an rstatd RPC server. Calypso has been tested with Linux kernel version 2.0.29, GNU's g++ version 2.7.2, TclX version 7.5.2, and rpc.rstatd version 3.02. However, it should also run on other versions of the software packages. These software packages are included in the RedHat Linux version 4.0 with the exception of the RPC daemon, which has to be installed on every host machine.

Other Platforms.

Calypso runs on platforms other than Linux, including Windows NT, Solaris and SunOS. Feel free to contact us if you are interested in Calypso for another platform. See Section 7 for further information.

Getting Started With Calypso

It is assumed that the reader has an understanding of C or C++. To get started running test programs, read Section 2.3 to setup your environment variables, and Section 3 for a quick introduction to the Calypso Graphical User Interface (CGUI).

Installation

Configuring Hardware

Calypso may be installed on any PC that meets the hardware requirements stated above. Calypso can run on one or several workstations connected as a network.

It is required that the user installs copies of the Calypso program in a way that makes them accessible to all the participating machines and has remote-login access for those machines.

Installing Software

 

Software installation is dependent upon the type of user. A normal user without network administrator privileges can install Calypso in her private home directory. On the other hand, a system administrator can install Calypso in a global area to be accessed by multiple users. To install the software, first go to the directory where Calypso will be installed. We will refer to this directory as CALYPSO_ROOT. The recommended CALYPSO_ROOT is /usr/local for system installation, or the home directory for an individual user. Next, uncompress and untar the calypso-vxx.tar.gzip file using the command below.

tex2html_wrap868 The above command will create the following directories in CALYPSO_ROOT.

 
CALYPSO_ROOT/calypso/          		 Self contained directory

CALYPSO_ROOT/calypso/README Latest information regarding the software release

CALYPSO_ROOT/calypso/bin/ Contains all the executables related to Calypso

CALYPSO_ROOT/calypso/doc/ Documents

CALYPSO_ROOT/calypso/examples/ Sample source code and calypso executables

CALYPSO_ROOT/calypso/lib/ Calypso (link-time) libraries

A complete file listing can be found in the README file.

Setting up Environment

  Each user interested in using Calypso must add CALYPSO_ROOT/bin and CALYPSO_ROOT/examples to their default search path. In addition to use the Calypso Graphical User Interface (CGUI), users need to have xterm on their search path. For a bash shell, this can be implemented with the following:

tex2html_wrap870 And for csh or tcsh:

tex2html_wrap872 It is required that the user (a) has the Calypso program on their search path, and (b) has enough (network) privileges to spawn remote shells.

Here is a simple test to verify that the environment is correctly configured. Assuming that the user wants to run a Calypso program called gol on the machine named happy. Executing the command

tex2html_wrap874 should succeed in displaying a proper search path.

During the software installation, a calypsorc file is created to store the names of other hosts connected to the network. This file is located at CALYPSO_ROOT/calypsorc. The system administrator must update this file with the list of hosts connected to their network. By default the host will look to this file for a list of potential workers. To override this default list, the user may create a ~/.calypsorc file containing a list of other host names. Calypso will first look for the ~/.calypsorc file. If this file does not exist, it will consult the system wide /usr/local/calypso/calypsorc file.

Getting Started Running Sample Programs

 

Running Sample Programs

  figure139
Figure 1: Execution of Mandelbrot Sample Program From Calypso Graphical User Interface

Type cgui & at the command prompt to display Calypso's Graphical User Interface (CGUI).

To run a sample Calypso program from the CGUI, input the program name, additional arguments if required, and select a host machine to run the program. For most sample programs you only need to input the program name -- the defaults take care of other fields. The example in Figure 3.1 shows the CGUI running mand, the Mandelbrot sample program on a machine named clifford. In this case there were no additional arguments.

Press the tex2html_wrap878 button to begin program execution. Progress of the program can be monitored by pressing the tex2html_wrap880. Assigned jobs are indicated as red lines. The color of these lines change to blue when completed. The utilization of other computers on the network can be viewed with the tex2html_wrap882. These other network computers can be selected individually to join in the computations. A nice gauge allows the user to set job priority.

Refer to Section 6 for a detailed description of the CGUI.

Programming Environment

Abstract Programming Model

It is best to think of the a Calypso program as a sequential program with embedded parallel steps. Sequential parts of a program commonly perform initialization, I/O, user interactions, etc., whereas parallel steps are generally responsible for computationally intensive segments of the program. A parallel step is a new compound statement and it can be inserted anywhere in the program. It is important to note that Calypso programs are written for a virtual shared-memory parallel machine with an unbounded number of processors. This virtual parallel machine is realized by the Calypso runtime system. Therefore, any program, independent of the number of parallel tasks, can run to completion on any number of host machines.

A Calypso program basically consists of the standard C++ programming language, augmented by four additional keywords to express parallelism. These four key words are: parbegin, parend, routine, and shared. A parallel step starts with the keyword parbegin and ends with the keyword parend. Within a parallel step, multiple concurrent jobs may be defined using the keyword routine. Completion of a parallel step consists of completion of all its jobs in an indeterminate order. Shared memory semantics is provided for global variables that are tagged with the keyword shared.

Example Program: Parallel Hello World

  figure157
Figure 2: Parallel implementation of Hello World in Calypso

At first we will describe programming in Calypso through an illustration. Figure 4.2 contains a parallel implementation of a Hello World program. The program consists of three logical execution blocks: the first being the sequential code up to the parbegin; the second execution block is the parallel step enclosed within parbegin ...parend; the rest of the program constitutes the third execution block. Let us consider the program in more detail.

See the appendix for additional sample programs.

The Calypso Source Language

The programming language for Calypso is called the Calypso Source Language, or CSL. CSL is standard C++ with minor enhancements and several features, that have certain structural constraints. In this section we will describe each feature and each constraint in detail.

File Extension.

CSL programs should use .csl as their file extension. The Calypso preprocessor reads in a CSL program, expecting the .csl file extension, and writes a standard C++ program into a file with the same prefix name, but with .C extension. For example, if foo.csl is input, the preprocessor will generate foo.C.

Calypso Header.

Every CSL program must include calypso.H.

Main Function.

As previously mentioned, the execution of a Calypso program begins with a function named calypso_main. This is analogous to C++'s main function.

Shared Variables.

The Calypso runtime system provides the illusion of a shared-memory, parallel machine on a network of workstations. This is referred to as (Virtual) Distributed Shared Memory, or DSM. This service, however, is only provided for the region of the memory which is typically referred to as the shared memory. This implies that all (non-temporary) variables accessed inside parallel steps must be declared as shared.

The keyword shared, is used to declare a set of variables as shared, has the following syntax.

tex2html_wrap894 For example, the following code fragment declares integers i and j, and the character array buffer as shared memory.

tex2html_wrap896 The supported memory coherence model is described  below.

Parallel Steps.

A parallel step is a new compound statement and it can be inserted anywhere in the program, except, inside another parallel step.gif A parallel step starts at the keyword parbegin and ends with the keyword parend. There can be one or more routine\ statements defined within a parallel step. However, a parallel step generally has the following form.

tex2html_wrap900 To better understand parallel steps, consider the events that occur as the result of the above code segment. Logically, for each routine\ statement the following events occur.

When all the jobs, of all the routine statements execute to completion, the flow of control ``falls out'' from the parallel block and the program execution continues with the first statement after the parend.

It should be noted that in the above example, the first two routine\ statements illustrate the most general form. The third declaration is a special case that syntactically and semantically expands to the following.

tex2html_wrap902

Shared Memory Coherence Model.

 

Within the body of a routine statement, the following applies.

Runtime Library Functions

 

Two Calypso library functions can be called. Their purpose is to give the program the ability to request, by name or by number, other host machines to join in the computation. The function prototypes are defined in calypso.H and are listed below.

tex2html_wrap904 The calypso_spawnWorker(char *host) function call simply starts a shell on the (possibly remote) host machine, and spawns a worker process within the shell. Furthermore, if it detects the X11 windowing-system, it runs the worker inside of an xterm started as an icon.

The calypso_spawnWorkers(int num) function requires the existence of a calypsorc file. (See Section 2.3.) This function selects num names from the list machines names in the calypsorc file, and spawns worker processes on each of the machines.

Compiling and Linking a Calypso Program

A Calypso program may be written in one or more files with the .csl extension. The Calypso preprocessor can then be used to translate those files into standard C++ programs stored in files with the .C extension. Each of the files can be compiled separately, and then linked with the Calypso library to produce an executable program.

For example, the following instructions preprocess, compile, and then link the helloWorld.csl program seen earlier:

tex2html_wrap906

  figure270
Figure 3: Calypso Graphical User Interface

Calypso Graphical User Interface (CGUI)

  The CGUI provides an easy to use interface to execute a calypso program, monitor its progress, view the utilization of other computers on the network and use them in the computation.

The program name and manager name must be input to execute a program. The additional argument field is used to enter additional arguments as required by the program.

The CGUI allows the selection of six options: run in xterm, timer, local worker, bunch jobs, smart scheduling, and online update.

The CGUI also features buttons to activate the Remote Execution Tool and the Execution Monitoring Tool.

  figure289
Figure 4: Remote Execution Tool

Remote Execution Tool.

With this tool, other computers on the network can be selected individually to participate in the computations. The utilization of other computers on the network can also be viewed. The nice gauge is used to set job priority. Job priority can be set from 0 to 20, where 0 is the highest priority. Other host names, not included in the calypsorc file can be added in the hostname to add field.

  figure296
Figure 5: Execution Monitoring Tool

Execution Monitoring Tool.

Progress of the program can be monitored by pressing the Execution Monitoring Tool button. In the example below, two machines, clifford and rollins, are executing the gol sample program. Assigned jobs are indicated as red lines. These lines change to blue when the job is completed.

Further Information

  To learn more about the key concepts of Calypso, refer to the bibliography or visit the project's home page at http://www.cs.nyu.edu/calypso. For questions regarding the software or this manual, send email to calypso@cs.nyu.edu.

Acknowledgements

We wish to thank the following individuals for their contributions to this project: Arash Baratloo, Churngwei Chu, Partha Dasgupta, Mehmet Karaul, Zvi Kedem, Dimitri Krakovsky, Tom Landman, Fabian Monrose, Naftali Schwartz, and David Stark.

This effort is being sponsored by the Defense Advanced Research Projects Agency and Rome Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-96-1-0320,by the National Science Foundation under grant number CCR-94-11590, and by the Intel Corporation. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.

The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Defense Advanced Research Projects Agency, Rome Laboratory, or the U.S. Government.

References

1
Y. Aumann, Z. Kedem, K. Palem, and M. Rabin. Highly efficient asynchronous execution of large-grained parallel programs. In Proc. of 34th IEEE Annual Symposium on Foundations of Computer Science, 1993.

2
[hpdc95] A. Baratloo, P. Dasgupta, and Z. Kedem. Calypso: A novel software system for fault-tolerant parallel processing on distributed platforms. In Proc. of IEEE International Symposium on High-Performance Distributed Computing, 1995.

3
A. Baratloo, P. Dasgupta, Z. Kedem, and D. Krakovsky. Calypso goes to Wall Street: A case Study. In Proc. of 3rd International Conference on Artificial Intelligence Applications on Wall Street, 1995.

4
A. Baratloo, M. Karaul, Z. Kedem, and P. Wyckoff. Metacomputing on the Web. In Proc. of the 9th International Conference on Parallel and Distributed Computing Systems, September 1996.

5
P. Dasgupta, Z. Kedem, and M. Rabin. Parallel processing on networks of workstations: A fault-tolerant, high performance approach. In Proc. of the 15th IEEE International Conference on Distributed Computing Systems, 1995.

6
S. Huang and Z. Kedem. Supporting a flexible parallel programming model on a network of workstations. In Proc. 16th IEEE International Conference on Distributed Computing Systems, 1996.

7
Z. Kedem and K. Palem. Transformations for the automatic derivation of resilient parallel programs. In Proc. of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, 1992.

8
Z. Kedem, K. Palem, M. Rabin, and A. Raghunathan. Efficient program transformations for resilient parallel computation via randomization. In Proc. of 24th ACM Symposium on Theory of Computing, 1992.

9
Z. Kedem, K. Palem, A. Raghunathan, and P. Spirakis. Combining tentative and definite algorithms for very fast dependable parallel computing. In Proc. of 23rd ACM Symposium on Theory of Computing, 1991.

10
Z. Kedem, K. Palem, A. Raghunathan, and P. Spirakis. Resilient parallel computing on unreliable parallel machines. In A. Gibbons and P. Spirakis, editors, Lectures on Parallel Computation. Cambridge University Press, 1993.

11
Z. Kedem, K. Palem, and P. Spirakis. Efficient robust parallel computations. In Proc. of 22nd ACM Symposium on Theory of Computing, 1990.

Matrix Multiplication

tex2html_wrap914