Dear Colleagues,

Zeromq is a framework for multiprocess communication that 
can be used for both clustering and supercomputing.
http://www.zeromq.org/
It supports 30+ languages.
You should be able to access it from the compute servers.
Here is how it goes for our most common configuration of
multiple clients (players) talking to a single server (the architect).
(The real power of zeromq would come if we wanted to support a 
multiple client/multiple server configuration with fault tolerance.
That turns out to be quite easy with zeromq.)
The following documentation was written by Jay Han whose day job is at google.

Best,
Dennis

## Multiple Clients Talking to a Single Server with Different Languages for the Clients and the Server

Launch a single server from a terminal:

    $ perl rep.pl # or python rep.py

Launch many clients `c.py` in many terminals. `c.py` sends a date/time string every second to the server `rep.pl`:

    $ python c.py
    python (pid=7962): sending data: 'Fri Aug 10 00:03:40 2012'
    python (pid=7962): received sha1 of data: a41e87cbe2b1ee6765383de73f571f9d44087101
    python (pid=7962): sending data: 'Fri Aug 10 00:03:41 2012'
    python (pid=7962): received sha1 of data: 97bf5dfb1c76991dbb15054728826422c175e13e
    # ... in other terminals the different pid values of c.py will appear


YOU CAN STOP HERE FOR PURPOSES OF MULTIPLE CLIENTS/SINGLE SERVER
(HEURISTICS CLASS).
READ ON FOR FULL DISTRIBUTION.

## Multiple Clients/Multiple Servers with Load Balancing and Fault Tolerance

Multiple clients and a single server configuration has at least two problems: the server has a scalability limit and is a single point of failure (SPOF). Running multiple servers to share load is a solution to SPOF and scalability. However this solution needs more elaboration. If the clients need to maintain addresses of the servers, then the clients themselves need to handle load balancing (choosing one server out of many) and fault tolerance (as servers appear/disappear). It is obvious that clients need to coordinate among themselves to devise globally optimal dynamic configuration because otherwise poor utilization will be added to the problem of SPOF and scalability.

For a simple demonstration of multiple clients and multiple servers, we use

* clients that generate and send a random string every second to servers 
* servers that echo their input back to the clients (`mrep.pl` and `mrep.py`)
* a message queue or a request-reply broker that mediates the clients and the servers as shown in the [ZeroMQ Guide][broker].  

Start the message queue:

    $ 'python msgqueue.py' or 'perl msgqueue.pl'
    
Start the servers:

    $ python mrep.py
    $ perl mrep.pl
    $ perl mrep.pl
    ...

Start the clients (right now only in python):

    $ python mreq.py
    ...

Output from the servers will show multiple client pid values:

    perl: received 'turuyiuirw (from client pid=10922)' from client.
    perl (pid=10862): sending back the received data: turuyiuirw (from client pid=10922)
    perl: received 'ttyryyrqyp (from client pid=10920)' from client.
    perl (pid=10862): sending back the received data: ttyryyrqyp (from client pid=10920)
    perl: received 'toitiwtqer (from client pid=10909)' from client.
    perl (pid=10862): sending back the received data: toitiwtqer (from client pid=10909)
    ...

Output from a client pid=10909 shows the client received back what it had sent out, not the data other clients sent to servers:

    12-08-10 01:36:35 q (pid=10909) received the original data: yoqwpyuyyt (from client pid=10909)
    12-08-10 01:36:37 q (pid=10909) received the original data: pwiertptor (from client pid=10909)
    ... 

### Load Balancing and Fault Tolerance

Try the following:

* Kill (`Control-C`) a server and the clients do not notice. (fault tolerance)
* Kill all the servers and the clients "pause" or become idle.
* Start a server, it handles all requests from the clients that come out of their pause.
* Start more servers and they share the load together. (load balancing)

* Also if the broker is killed and then restarted, things start up again.
