Running the Defibrillator application on a remote cluster with SimX/SCIRun

Simon Yau, 13rd Feb, 2007

This document is intended for users of SCIRun who have been able to run a single processor version of the defibrillator application using the SimX/SCIRun system as described here and wishes to move the simulator processes onto a standard TCP/IP cluster.

A set of scripts can be obtained here: Staging Area script. This tarball unpacks into a directory which acts as the staging area where the user performs a computational study. It contains several shell scripts as well as configuration files that will be used by the SimX components in the SCIRun net. It is assumed that all the nodes on the TCP/IP cluster, as well as the front end node, have access to the directory using the same path name.

Requirements:

In addition to the requirements needed to run SimX/SCIRun on a single computer, when running the simulator nets on remote nodes, the graphical interface of SCIRun needs to be disabled. This is achieved by directing their graphical outputs to the Xvfb, X Virtual Frame Buffer, rather than the X server of the front end node like they normally do. Therefore, the user should first verify that Xvfb works on the compute nodes.

Running the simulators from script

The server script is written for the PBS system on the max, and can be adapted to other standard TCP/IP clusters as described below.

There are three scripts written for the starting of the simulator:

runserv.pl: A perl script running on the remote compute nodes of the cluster. They set up the environment variables needed for running a SCIRun simuator net, starts the Xvfb server, and starts the simulator net. Users need to make modification to point $SCIRUNROOT to the directory containing the scirun executable, $SCIRUNSTAGE to the directory of the staging area, and $XSERVERCOMMAND to a command that starts the Xvfb server. The user may (depending on whether he's done it in the login script) need to set up the environment variables CFLAGS, CXXFLAGS, LDFLAGS, and LD_LIBRARY_PATH as described here within the runserv.pl script. If this is done in the login script, those lines can be commented out.

runserv2.pl: A perl script running on the remote compute nodes of the cluster. This script sets up the environment variables needed for running the SISOL servers, and runs it. The user needs to set $SISOLEXECUTABLE to point to the SISOL data server executable. If the user did not set LD_LIBRARY_PATH in his login script, he can do it in the runserv2.pl script.

startsim.sh: A shell script running on the front end, responsible for spawning simulator and SISOL server processes on remote machines. The user needs to point $SCIRUNROOT to the directory containing the scirun executable, and $SCIRUNSTAGE to the directory of the staging area. And if they haven't done so in the login script, set the LD_LIBRARY_PATH. The environment variable $MACHINEFILE should contain the name of the machine file (a file where the assigned compute nodes are listed one on each line).

To run the SISOL servers and simulators, the user runs the startsim.sh script on the front end computer, with the viewer net as the argument:

./startsim.sh defib-sim.srn

startsim.sh will then call runserv.pl and runserv2.pl remotely on the hosts listed on $MACHINEFILE. By default, the startsim.sh script will spawn one SISOL directory server on the local machine, 4 SISOL data servers to remote nodes, and a simulator net on each of the the rest of the remote nodes.

After starting the startsim.sh script, the script should output lots of information. Wait for these messages:

FUEL: [compute node] binding to socket [simx port].
FUEL: listening to port [simx port].
.
.

They indicate that the simulation nets are ready to accept connection from the viewer.

Run the viewer from script

The viewer net is started from the startview.sh script, to be run on the front end machine.

In the startview.sh script, SIMXROOT should point to the directory where SimX is unpacked; SCIRUNROOT should point to the directory containing the scirun executable; SCIRUNSTAGE should point to the directory of the staging area. The user may (depending on whether he's done it in the login script) need to set up the environment variables CFLAGS, CXXFLAGS, LDFLAGS, and LD_LIBRARY_PATH as described here.

After starting the servers and seeing the "FUEL: listening to port" messages, the user can start the viewer net by the startview.sh script, giving it the viewer net as an argument:

./startview.sh defib-view.srn

Cleanup

After an experiment is done, the user can run the clesnup.sh script to kill the running processes (SISOL Server, simulator SCIRun process, Xvfb) on the remote computers.

In the cleanup.sh script, SIMXROOT should point to the directory where SimX is unpacked; SCIRUNROOT should point to the directory containing the scirun executable; SCIRUNSTAGE should point to the directory of the staging area.

Controlling the explored space

The user can control the explored design space by editing the viewerConfig.txt.Tmpl file. This file contains three line that has the same semeantics as the defibViewerConfig.txt file described here. If the user wishes to switch to a different net that explores different subspace of the design space, he must edit this file to reflect that change.