Calypso is being developed as part of the
a joint research effort between New York University and Arizona State University.
Calypso is a software system that allows the user to tap into unused computer cycles by providing a parallel processing platform for a network of standard workstations or personal computers. Calypso has the following features.
Ease of programming. Since existing C or C++ programs can be adapted for Calypso, the programmer does not have to worry about learning a new programming language. In fact, the Calypso Source Language (CSL) is just C++ with four additional key words.
Separation of programming parallelism from execution parallelism. The parallelism of the program is determined by the programmer and is a separate entity from the parallelism of the execution environment, which is determined by the availability of workstations at the time of execution. With Calypso, the mapping between program parallelism and execution parallelism is transparent to the user.
Dynamic load balancing and fault tolerance. Calypso\ automatically distributes the work load depending upon the dynamics of the participating machines. Faster machines do not have to wait for slower machines to finish their ``work assignments''-- faster machines overtake the slower ones. In addition, Calypso executions are resilient to failures. All processes, other than a specific designated ``manager'' process, can fail at any point without affecting the correctness of the computation.
A ``normal'' execution of a Calypso program consists of a ``central'' process, called the manager, and one or more worker processes, called workers. These processes can reside on a single machine or they can be distributed on a network.
The manager is responsible for the management of the computation. This manager is assigned the task of executing sequential sections of the code. The current Calypso version only allows one manager, and thus it does not tolerate the failure of this process.
The computation of parallel jobs is left to the workers. There can be zero or more workers present at any one time, but at least one active worker is needed for a parallel computation to proceed; otherwise the computation is suspended, until a worker appears.
In general, the number of workers and the resources they can devote to the parallel computations can dynamically change in a completely arbitrary manner. This dynamic behavior is completely transparent to the Calypso program. In fact, arbitrarily asynchronous behavior of workers, their failures due to process crash or preemption for other jobs, and inaccessibility due to network partitions are tolerated. Furthermore, workers can be added at any time to speed up an already executing system and increase the fault tolerance.
Arbitrary asynchronous behavior of the manager is tolerated too. Of course, if it is slow, the overall execution will suffer.
For the design and the implementation of the Calypso software system, and a complete description of manager and worker processes, see .
Any commercial off-the-shelf PC with an 80386, 80486, Pentium, Pentium Pro or compatible processor and a minimum of 8MB RAM (16MB is recommended) is suitable to run Calypso. To use several workstations, they must be connected as a network.
Calypso programs require the Linux operating system, a C++ compiler, the Extended Tcl/tk (TclX) scripting language, and an rstatd RPC server. Calypso has been tested with Linux kernel version 2.0.29, GNU's g++ version 2.7.2, TclX version 7.5.2, and rpc.rstatd version 3.02. However, it should also run on other versions of the software packages. These software packages are included in the RedHat Linux version 4.0 with the exception of the RPC daemon, which has to be installed on every host machine.
Calypso runs on platforms other than Linux, including Windows NT, Solaris and SunOS. Feel free to contact us if you are interested in Calypso for another platform. See Section 7 for further information.
It is assumed that the reader has an understanding of C or C++. To get started running test programs, read Section 2.3 to setup your environment variables, and Section 3 for a quick introduction to the Calypso Graphical User Interface (CGUI).
Calypso may be installed on any PC that meets the hardware requirements stated above. Calypso can run on one or several workstations connected as a network.
It is required that the user installs copies of the Calypso program in a way that makes them accessible to all the participating machines and has remote-login access for those machines.
Software installation is dependent upon the type of user. A normal user without network administrator privileges can install Calypso in her private home directory. On the other hand, a system administrator can install Calypso in a global area to be accessed by multiple users. To install the software, first go to the directory where Calypso will be installed. We will refer to this directory as CALYPSO_ROOT. The recommended CALYPSO_ROOT is /usr/local for system installation, or the home directory for an individual user. Next, uncompress and untar the calypso-vxx.tar.gzip file using the command below.
The above command will create the following directories in CALYPSO_ROOT.
CALYPSO_ROOT/calypso/ Self contained directory
CALYPSO_ROOT/calypso/README Latest information regarding the software release
CALYPSO_ROOT/calypso/bin/ Contains all the executables related to Calypso
CALYPSO_ROOT/calypso/examples/ Sample source code and calypso executables
CALYPSO_ROOT/calypso/lib/ Calypso (link-time) libraries
A complete file listing can be found in the README file.
Each user interested in using Calypso must add CALYPSO_ROOT/bin and CALYPSO_ROOT/examples to their default search path. In addition to use the Calypso Graphical User Interface (CGUI), users need to have xterm on their search path. For a bash shell, this can be implemented with the following:
And for csh or tcsh:
It is required that the user (a) has the Calypso program on their search path, and (b) has enough (network) privileges to spawn remote shells.
Here is a simple test to verify that the environment is correctly configured. Assuming that the user wants to run a Calypso program called gol on the machine named happy. Executing the command
should succeed in displaying a proper search path.
During the software installation, a calypsorc file is created to store the names of other hosts connected to the network. This file is located at CALYPSO_ROOT/calypsorc. The system administrator must update this file with the list of hosts connected to their network. By default the host will look to this file for a list of potential workers. To override this default list, the user may create a ~/.calypsorc file containing a list of other host names. Calypso will first look for the ~/.calypsorc file. If this file does not exist, it will consult the system wide /usr/local/calypso/calypsorc file.
Figure 1: Execution of Mandelbrot Sample Program From Calypso Graphical User Interface
Type cgui & at the command prompt to display Calypso's Graphical User Interface (CGUI).
To run a sample Calypso program from the CGUI, input the program name, additional arguments if required, and select a host machine to run the program. For most sample programs you only need to input the program name -- the defaults take care of other fields. The example in Figure 3.1 shows the CGUI running mand, the Mandelbrot sample program on a machine named clifford. In this case there were no additional arguments.
Press the button to begin program execution. Progress of the program can be monitored by pressing the . Assigned jobs are indicated as red lines. The color of these lines change to blue when completed. The utilization of other computers on the network can be viewed with the . These other network computers can be selected individually to join in the computations. A nice gauge allows the user to set job priority.
Refer to Section 6 for a detailed description of the CGUI.
It is best to think of the a Calypso program as a sequential program with embedded parallel steps. Sequential parts of a program commonly perform initialization, I/O, user interactions, etc., whereas parallel steps are generally responsible for computationally intensive segments of the program. A parallel step is a new compound statement and it can be inserted anywhere in the program. It is important to note that Calypso programs are written for a virtual shared-memory parallel machine with an unbounded number of processors. This virtual parallel machine is realized by the Calypso runtime system. Therefore, any program, independent of the number of parallel tasks, can run to completion on any number of host machines.
A Calypso program basically consists of the standard C++ programming language, augmented by four additional keywords to express parallelism. These four key words are: parbegin, parend, routine, and shared. A parallel step starts with the keyword parbegin and ends with the keyword parend. Within a parallel step, multiple concurrent jobs may be defined using the keyword routine. Completion of a parallel step consists of completion of all its jobs in an indeterminate order. Shared memory semantics is provided for global variables that are tagged with the keyword shared.
Figure 2: Parallel implementation of Hello World in Calypso
At first we will describe programming in Calypso through an illustration. Figure 4.2 contains a parallel implementation of a Hello World program. The program consists of three logical execution blocks: the first being the sequential code up to the parbegin; the second execution block is the parallel step enclosed within parbegin ...parend; the rest of the program constitutes the third execution block. Let us consider the program in more detail.
Lines 21-25 define a parallel step with only one routine. The statement routine[numberOfJobs] causes the expression inside hard-brackets to be evaluated at runtime, and creates that many jobs. The expression numberOfJobs is evaluated which in this case is the user input. This means that numberOfJobs jobs, all syntactically identical, and numbered 0 through , will be created. The value of the expression along with with the ID of each job is passed as formal parameters. In our example, the formal parameters are totalJobs and myId. In line 22, the jobs concurrently write their ID at an appropriate index of array.
See the appendix for additional sample programs.
The programming language for Calypso is called the Calypso Source Language, or CSL. CSL is standard C++ with minor enhancements and several features, that have certain structural constraints. In this section we will describe each feature and each constraint in detail.
CSL programs should use .csl as their file extension. The Calypso preprocessor reads in a CSL program, expecting the .csl file extension, and writes a standard C++ program into a file with the same prefix name, but with .C extension. For example, if foo.csl is input, the preprocessor will generate foo.C.
Every CSL program must include calypso.H.
As previously mentioned, the execution of a Calypso program begins with a function named calypso_main. This is analogous to C++'s main function.
The Calypso runtime system provides the illusion of a shared-memory, parallel machine on a network of workstations. This is referred to as (Virtual) Distributed Shared Memory, or DSM. This service, however, is only provided for the region of the memory which is typically referred to as the shared memory. This implies that all (non-temporary) variables accessed inside parallel steps must be declared as shared.
The keyword shared, is used to declare a set of variables as shared, has the following syntax.
For example, the following code fragment declares integers i and j, and the character array buffer as shared memory.
The supported memory coherence model is described below.
A parallel step is a new compound statement and it can be inserted anywhere in the program, except, inside another parallel step. A parallel step starts at the keyword parbegin and ends with the keyword parend. There can be one or more routine\ statements defined within a parallel step. However, a parallel step generally has the following form.
To better understand parallel steps, consider the events that occur as the result of the above code segment. Logically, for each routine\ statement the following events occur.
The variable num is initialized, by pass-by-value semantics, to the number of processes created for the specific routine; in fact it is int-exp. The variable id is initialized, again using pass-by-value semantics, to the number of the job. Job numbers (identifications) are numbered: 0, 1, ..., .
Intuitively, the idea behind the num and id is for a job to control its behavior based on the number of jobs the routine expands to, and the current job number.
When all the jobs, of all the routine statements execute to completion, the flow of control ``falls out'' from the parallel block and the program execution continues with the first statement after the parend.
It should be noted that in the above example, the first two routine\ statements illustrate the most general form. The third declaration is a special case that syntactically and semantically expands to the following.
Within the body of a routine statement, the following applies.
Two Calypso library functions can be called. Their purpose is to give the program the ability to request, by name or by number, other host machines to join in the computation. The function prototypes are defined in calypso.H and are listed below.
The calypso_spawnWorker(char *host) function call simply starts a shell on the (possibly remote) host machine, and spawns a worker process within the shell. Furthermore, if it detects the X11 windowing-system, it runs the worker inside of an xterm started as an icon.
The calypso_spawnWorkers(int num) function requires the existence of a calypsorc file. (See Section 2.3.) This function selects num names from the list machines names in the calypsorc file, and spawns worker processes on each of the machines.
A Calypso program may be written in one or more files with the .csl extension. The Calypso preprocessor can then be used to translate those files into standard C++ programs stored in files with the .C extension. Each of the files can be compiled separately, and then linked with the Calypso library to produce an executable program.
For example, the following instructions preprocess, compile, and then link the helloWorld.csl program seen earlier:
Figure 3: Calypso Graphical User Interface
The CGUI provides an easy to use interface to execute a calypso program, monitor its progress, view the utilization of other computers on the network and use them in the computation.
The program name and manager name must be input to execute a program. The additional argument field is used to enter additional arguments as required by the program.
The CGUI allows the selection of six options: run in xterm, timer, local worker, bunch jobs, smart scheduling, and online update.
The CGUI also features buttons to activate the Remote Execution Tool and the Execution Monitoring Tool.
Figure 4: Remote Execution Tool
Figure 5: Execution Monitoring Tool
To learn more about the key concepts of Calypso, refer to the bibliography or visit the project's home page at http://www.cs.nyu.edu/calypso. For questions regarding the software or this manual, send email to firstname.lastname@example.org.
We wish to thank the following individuals for their contributions to this project: Arash Baratloo, Churngwei Chu, Partha Dasgupta, Mehmet Karaul, Zvi Kedem, Dimitri Krakovsky, Tom Landman, Fabian Monrose, Naftali Schwartz, and David Stark.
This effort is being sponsored by the Defense Advanced Research Projects Agency and Rome Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-96-1-0320,by the National Science Foundation under grant number CCR-94-11590, and by the Intel Corporation. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Defense Advanced Research Projects Agency, Rome Laboratory, or the U.S. Government.