What can anyone give you greater than now" - William Stafford


NOW Tutorial


This document provides an overview of the use of the NOW-2 production system as UC Berkeley. NOW-2 currently consists of a cluster of 110 Ultra-1 SPARCStations. The production now runs the GLUnix Operating System, and supports Split-C parallel programs running on Active Messages.


Using GLUnix

Taking advantage of NOW functionality is straightforward. Simply ensure that /usr/now/bin is in your shell's PATH, and /usr/now/man in the MANPATH. To start taking advantage of GLUnix functionality, log into now.cs.berkeley.edu and start a glush shell. While the composition of the GLUnix parition may change over time, we make every effort to guarantee that now.cs is always running GLUnix. The glush shell runs most commands remotely on the lightly loaded nodes in the cluster.

To summarize:

  • Add /usr/now/bin to your path.
  • Add /usr/now/man to your man path.
  • Log into now.cs.berkeley.edu.
  • Run glush.

Load balancing GLUnix shell scripts are available. Syntax is identical to the csh command language. Simply begin your shell shell scripts with #!/usr/now/bin/glush. Note that you do not have to be running glush as your interactive shell in order to run load-balanced shell scripts.


The Split-C on NOW HOW-TO

NEW! Split-C now runs exclusively on AM2, the new version of Active Messages!!

What you must do now to run Split-C programs:

  • The environment has changed. Re-source the cs267.cshrc file
  • Older programs must be recompiled after you set up your environment
  • Don't use the -reserve option.
  • ONLY use the -reserve option for running production versions of your code. In other words, don't use the -reserve option until you have fully debugged your program.

How this affects you

  • You now have more nodes (~100) to run on
  • Fewer reservations (since they aren't needed) to contend with. Thus you should always have ~100 nodes to run on since AM2 programs don't require exclusive access to the node
  • No more 'xxx holding lock for lanai card' error messages
  • No more 'cannot open lanai copy block' error messages
  • Slightly slower communication performance

This document provides an overview on how to compile and run Split-C programs on the various NOW clusters. This HOWTO assumes you are familiar with Unix commands and changing your shell environment.

This document is *not* a tutorial on the language. For a good tutorial on the language itself, see Jim Demmel's Lecture on Split-C


Setting Your Environment

Type:
source /usr/castle/proj/cs267/cs267.cshrc


Compiling a Program

Now that you have set-up your Split-C environment you are ready to compile a program. The first thing to do is create a sub-directory where you want to compile the program. Then, copy a simple Makefile into the directory and let the system makefile take care of the rest. When using the system makefile, you must use the gmake program instead of the standard make. This link shows a sample Split-C program.

Here is an example session which should get you started. Make sure you're logged into a NOW machine first!

now:>cd now:>mkdir sc_example now:>cd sc_example now:>cp /usr/castle/share/proj/split-c/develop/examples/pi/Makefile . now:>cp /usr/castle/share/proj/split-c/develop/examples/pi/pi.sc . now:>gmake /usr/castle/share/proj/split-c/install/LAM/bin/split-cc -g -O2 -o bin-LAM/pi.o -c pi.sc Compiling for 2^N processors /usr/castle/share/proj/split-c/install/LAM/bin/split-cc -o pi bin-LAM/pi.o -L/usr/sww/X11/lib -lX11 -lm


Running a Program

Assuming you have your environment variables correctly set-up, running a Split-C program is very easy. Parallel programs are invoked with the glurun program (see the NOW Tutorial for a description of glunix commands). The format for glurun is: glurun -[#of nodes] [program to run] [arguements to the program]. To see all the options for glurun, do a glurun -help.

NOTE: For most Split-C programs, you'll have to run on a power of 2 number of processors (e.g. 1,2,4,8,16)

Below is an example session of running a Split-C program.

now:>glurun -4 pi PI estimated at 3.139500 from 1000000 trials on 4 processors.

If you have problems with Split-C, send email to split-c@boing.cs.berkeley.edu.


Running HPF Programs on the NOW

The following is a description of how to run HPF programs on the Berkeley NOW.

Setting up your environment

Type:
source /usr/castle/proj/cs267/cs267.cshrc


Log into the NOW


Compiling a program

You'll now be able to compile a sample program. Copy the following file to a directory: karp.hpf
This is a hpf file that computes PI through integration.

Now, execute the following command to compile the program. By default, the compiler will add the libraries necessary to communicate via sockets.

pghpf -Mstats karp.hpf -o karp

This will compile the karp program, and insert profiling code that will assist in reporting the memory used/messages sent by the program. The compiler will produce a file called 'karp'

If you experienced problems trying to compile, make sure that your path was set up correctly as noted in the first section above.


Running a program

Setting things up

  • There are two ways to execute programs - securely and unsecurely. We strongly recommend that you use the secure method.
  • Securely
    • Set up kerberos
    • rkinit to the remote machine that you are running on: (You don't have to rkinit to all remote machines, since the rsh command will transfer the ticket to the other remote machines.)

      This will set up a ticket on the current machine you are running on. You should only run 'kinit' to create tickets on the machine that you are typing at, otherwise your kerberos password will be transfered over the network in the clear. At the machine you are typing at, you should run 'rkinit' to install tickets on machines remotely - rkinit will not issue your password in cleartext.

      For example, if you are typing at the machine whenever.cs.berkeley.edu and you wanted to run programs from the machine u0.cs.berkeley.edu, you would type:

      rkinit u0.cs.berkeley.edu
  • Unsecurely (not recommended)
    • Add the current hostname to your .rhosts file

      For example, if you are logged into u0.cs.berkeley.edu, then add

      u0.cs.berkeley.edu

      to the end of your ~/.rhosts file.

Running the program

On one processor:

  • Type ./karp

Run it on 4 processors

  • SECURELY: ./karp -pghpf -np 4 -host remote_host1,remote_host2,remote_host3 -stat alls
  • UNSECURELY: ./karp -pghpf -np 4 -host remote_host1,remote_host2,remote_host3 -stat alls -rsh /bin/rsh

    where remote_host1,remote_host2,remote_host3 are the names of 3 machines that you are going to run on, e.g. u1,u2,u3

    Note that HPF will automatically run a copy on the current host, so you specify only the N-1 remote hosts on the command line. **TODO: create a 'rsh'-like shell to spawn jobs using GLUnix. HPF expects a rsh tool to spawn off jobs.

Runtime options

Consult the HPF documentation for the entire list of options, but here are some common ones:

  • -v for verbose runtime
  • -dyn for dynmically pick nodes based on load average

Advanced

The HPF compiler also supports some of CM FORTRAN. By default, CM FORTRAN is accepted in some cases. See the HPF for more details.


HPF on MPI

Compiling a HPF program with MPI

  • First, setup your environment as to compile a socket version of HPF program.
  • To compile with MPI option, you have to setup two environment variables before compiling:
    setenv HPF_MPI "/disks/barad-dur/now/MPI/mpich/lib/solaris/hpf/"
    setenv HPF_SOCKET "-lglunix -lLAM -lsocket -lnsl -lposix4 -lLanaiDevice -lbfd -liberity -lthread -lm"
  • Then, compile your program with the -Mmpi option:
    pghpf -Dmpi your_program.hpf -o your_executable

Running the executable

  • Setup the environment for the MPI. For more information consult the MPI on NOW HowTo
  • Then, run your executable as follows:
    your_executable -pghpf -np #_of_nodes

Note: MPI command line arguments that defined in MPI on NOW HowTo can still be used.
If you have problems, send email to chad@berkeley.edu


Message Passing Interface Implementation On Active Messages

Our implementation of MPI is based on the MPICH reference implementation, but realized the abstract device interface (ADI) through Active Messages operations. This approach achieves good performance and yet is portable across Active Messages platorms.

We have implemented MPICH-v1.0.12 on top of Generic Active Messages (GAM) and Active Messages 2.0. Our implementations are also avaiable for general use on the NOW cluster.


People

Advisors

Student

  • Frederick Wong

Utility Programs

We have built a number of utility programs for GLUnix. All of these programs located in /usr/now/bin. Man pages are available for all of these programs, either by running man from a shell, or by clicking here. A brief description of each utility program follows:

  • glush:
    The GLUnix shell is a modified version of tcsh. Most jobs submitted to the shell are load balanced among GLUnix machines. However, some jobs must be run locally since GLUnix does not provide completely transparent TTY support and since IO bandwidth to stdin, stdout, and stderr are limited by TCP bandwidth. The shell automatically runs a number of these jobs locally, however users may customize this list by adding programs to the glunix_runlocal shell variable. The variable indicates to glush those programs which should be run locally.
  • glumake:
    A modified version of GNU's make program. A -j argument specifies the degree of parallelism for the make. The degree of parallelism defaults to the number of nodes available in the cluster.
  • glurun:
    This program runs the specified program on the GLUnix cluster. For example glurun bigsim will run bigsim on the least loaded machine in the GLUnix cluster. You can run parallel program on the NOW by specifying the parameter -N where N is a number representing the degree of parallelism you wish. Thus glurun -5 bigsim will run bigsim on 5, least-loaded nodes.
  • glustat
    Prints the status of all machines in the GLUnix cluster.
  • glups
    Similar to Unix ps but only prints information about GLUnix processes.
  • glukill
    Sends an arbitrary signal (defaults to SIGTERM) to a specified GLUnix process.
  • gluptime
    Similar to Unix uptime, reporting on how long the system has been up and the current system load.
  • glubatch
    A batch job submission system for submitting and querying non-interactive, low-priority jobs.

A Sample GLUnix Program

Each program running under GLUnix has a startup process which runs in your shell and a number of child processes which run on remote nodes. There must be at least one child process, and may be up to one for each node currently running GLUnix. The startup process is responsible for routing signal information (for example, if you type ^Z or ^C) and input/output to the child processes. The child processes then make up the program itself. If there is more than one child, this is a parallel program, else it is a sequential program.

Here is the code and Makefile for a sample program which runs under GLUnix (use gmake with this Makefile). This routine provides the code for both the startup and child processes. The distinction between the two kinds of processes is made using the Glib_AmIStartup() library call.

#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include "glib/types.h" #include "glib.h" int main(int argc, char **argv) { int numNodes; VNN vnn; if(!Glib_Initialize()) { fprintf(stderr,"Glib_Initialize failed\n"); exit(-1); } if (argc > 1) { numNodes = atoi(argv[1]); } else { numNodes = 2; } if (Glib_AmIStartup()) { /* Startup process runs here */ printf("Startup is spawning %d children\n", numNodes); Glib_Spawnef(numNodes, GLIB_SPAWN_OUTPUT_VNN, argv[0], argv, environ); } else { /* Child process(es) run here */ vnn = Glib_GetMyVnn(); printf("***** I am a child process\n"); printf("***** VNN: %d\n", (int)vnn); printf("***** Degree of program parallelism: %d\n", Glib_GetParallelDegree()); printf("***** Total Nodes in system: %d\n", Glib_GetTotalNodes()); /* Make one of the children delay to prove that nobody exits the barrier before everybody has entered it */ if (vnn == 0) { printf("***** Child %d is sleeping\n", vnn); fflush(stdout); sleep(5); } printf("***** Doing Barrier\n"); fflush(stdout); Glib_Barrier(); printf("***** Done with Barrier\n"); } return 0; }

The Makefile for this program (if you call it test.c) is:

CC = gcc CFLAGS = -Wall -g TARGET = test SRCS = test.c LIBS = -lglunix -lam2 -lsocket -lnsl MANS = test.1 MANHOME = ../../man/man1 BINHOME = ../../bin/sun4-solaris2.4-gamtcp LIBPATH = /usr/now/lib INCLUDEPATH = /usr/now/include/ ############################################################### LLIBPATH = $(addprefix -L,$(LIBPATH)) RLIBPATH = $(addprefix -R,$(LIBPATH)) INCPATH = $(addprefix -I,$(INCLUDEPATH)) all: $(TARGET) $(TARGET): $(SRCS) gcc $(CFLAGS) -o $(TARGET) $(SRCS) $(RLIBPATH) \ $(LLIBPATH) $(INCPATH) $(LIBS) clean: rm -f $(TARGET) core *~ *.o install: $(TARGET) installman cp $(TARGET) $(BINHOME) installman: cp $(MANS) $(MANHOME)

Output from this program should look something like this (though the order of the output lines may vary):

% ./test Startup is spawning 2 children 1:***** I am a child process 1:***** VNN: 1 1:***** Degree of program parallelism: 2 1:***** Total Nodes in system: 14 1:***** Doing Barrier 0:***** I am a child process 0:***** VNN: 0 0:***** Degree of program parallelism: 2 0:***** Total Nodes in system: 14 0:***** Child 0 is sleeping 0:***** Doing Barrier 1:***** Done with Barrier 0:***** Done with Barrier %


GLUnix Implementation Stauts

The following functionality is implemented in NOW-1:

  • Remote Execution:
    Jobs can be started on any node in the GLUnix cluster. A single job may spawn multiple worker processes on different nodes in the system.
  • Load Balancing:
    GLUnix maintains imprecise information on the load of each machine in the cluster. The system farms out jobs to the node which it considers least loaded at request time.
  • Signal Propagation:
    A signal sent to a process is multiplexed to all worker processes comprising the GLUnix process.
  • Coscheduling:
    Jobs spawned to multiple nodes can be gang scheduled to achieve better performance. The current coscheduling time quantum is 1 second.
  • IO Redirection:
    Output to stdout or stderr are piped back to the startup node. Characters sent to stdin are multiplexed to all worker processes. Output redirection is limited by network bandwidth.

Nameserver Documentation

Click the Link



Note: The project was terminated on June 15, 1998

This material is based upon work supported by the National Science Foundation under grants CCR-9257974 and PFF-CCR-9253705 as well as ARPA under grant F30602-95-C-0014. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.