Welcome to the NOW Tutorial


This document provides an overview of the use of the NOW-2 production system as UC Berkeley. NOW-2 currently consists of a cluster of 110 Ultra-1 SPARCStations. The production now runs the GLUnix Operating System, and supports Split-C parallel programs running on Active Messages.

You can now check the current runtime status of GLUnix right from this page!

This document is organized as follows:


Using GLUnix

Taking advantage of NOW functionality is straightforward. Simply ensure that /usr/now/bin is in your shell's PATH, and /usr/now/man in the MANPATH. To start taking advantage of GLUnix functionality, log into now.cs.berkeley.edu and start a glush shell. While the composition of the GLUnix parition may change over time, we make every effort to guarantee that now.cs is always running GLUnix. The glush shell runs most commands remotely on the lightly loaded nodes in the cluster.

To summarize:

Load balancing GLUnix shell scripts are available. Syntax is identical to the csh command language. Simply begin your shell shell scripts with #!/usr/now/bin/glush. Note that you do not have to be running glush as your interactive shell in order to run load-balanced shell scripts.


Utility Programs

We have built a number of utility programs for GLUnix. All of these programs located in /usr/now/bin. Man pages are available for all of these programs, either by running man from a shell, or by clicking
here. A brief description of each utility program follows:
glush: The GLUnix shell is a modified version of tcsh. Most jobs submitted to the shell are load balanced among GLUnix machines. However, some jobs must be run locally since GLUnix does not provide completely transparent TTY support and since IO bandwidth to stdin, stdout, and stderr are limited by TCP bandwidth. The shell automatically runs a number of these jobs locally, however users may customize this list by adding programs to the glunix_runlocal shell variable. The variable indicates to glush those programs which should be run locally.
glumake: A modified version of GNU's make program. A -j argument specifies the degree of parallelism for the make. The degree of parallelism defaults to the number of nodes available in the cluster.
glurun: This program runs the specified program on the GLUnix cluster. For example glurun bigsim will run bigsim on the least loaded machine in the GLUnix cluster. You can run parallel program on the NOW by specifying the parameter -N where N is a number representing the degree of parallelism you wish. Thus glurun -5 bigsim will run bigsim on 5, least-loaded nodes.
glustat: Prints the status of all machines in the GLUnix cluster.
glups: Similar to Unix ps but only prints information about GLUnix processes.
glukill: Sends an arbitrary signal (defaults to SIGTERM) to a specified GLUnix process.
gluptime: Similar to Unix uptime, reporting on how long the system has been up and the current system load.
glubatch: A batch job submission system for submitting and querying non-interactive, low-priority jobs.

A Sample GLUnix Program

Each program running under GLUnix has a startup process which runs in your shell and a number of child processes which run on remote nodes. There must be at least one child process, and may be up to one for each node currently running GLUnix. The startup process is responsible for routing signal information (for example, if you type ^Z or ^C) and input/output to the child processes. The child processes then make up the program itself. If there is more than one child, this is a parallel program, else it is a sequential program.

Here is the code and Makefile for a sample program which runs under GLUnix (use gmake with this Makefile). This routine provides the code for both the startup and child processes. The distinction between the two kinds of processes is made using the Glib_AmIStartup() library call.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include "glib/types.h"
#include "glib.h"

int
main(int argc, char **argv)
{
    int   numNodes;
    VNN   vnn;

    if(!Glib_Initialize()) {
        fprintf(stderr,"Glib_Initialize failed\n");
        exit(-1);
    }

    if (argc > 1) {
	numNodes = atoi(argv[1]);
    }
    else {
	numNodes = 2;
    }

    if (Glib_AmIStartup()) {

	/* Startup process runs here */
	printf("Startup is spawning %d children\n", numNodes);
	Glib_Spawnef(numNodes, GLIB_SPAWN_OUTPUT_VNN,
	             argv[0], argv, environ);
    }
    else {

	/* Child process(es) run here */
	vnn = Glib_GetMyVnn();

	printf("***** I am a child process\n");
	printf("***** VNN: %d\n", (int)vnn);
	printf("***** Degree of program parallelism: %d\n",
	       Glib_GetParallelDegree());
	printf("***** Total Nodes in system: %d\n",
	       Glib_GetTotalNodes());

	/* Make one of the children delay to prove that nobody
	   exits the barrier before everybody has entered it */
	if (vnn == 0) {
	    printf("***** Child %d is sleeping\n", vnn);
	    fflush(stdout);
	    sleep(5);
	}

	printf("***** Doing Barrier\n");
	fflush(stdout);

	Glib_Barrier();
	printf("***** Done with Barrier\n");
    }
    return 0;
}

The Makefile for this program (if you call it test.c) is:

CC      = gcc
CFLAGS  = -Wall -g 

TARGET  = test
SRCS    = test.c
LIBS    = -lglunix -lam2 -lsocket -lnsl
MANS    = test.1

MANHOME = ../../man/man1
BINHOME = ../../bin/sun4-solaris2.4-gamtcp

LIBPATH = /usr/now/lib
INCLUDEPATH = /usr/now/include/

###############################################################

LLIBPATH = $(addprefix -L,$(LIBPATH))
RLIBPATH = $(addprefix -R,$(LIBPATH))
INCPATH  = $(addprefix -I,$(INCLUDEPATH))

all: $(TARGET)

$(TARGET): $(SRCS)
	gcc $(CFLAGS) -o $(TARGET) $(SRCS) $(RLIBPATH) \
            $(LLIBPATH) $(INCPATH) $(LIBS)

clean: 
	rm -f $(TARGET) core *~ *.o

install: $(TARGET) installman
	cp $(TARGET) $(BINHOME)

installman: 
	cp $(MANS) $(MANHOME)

Output from this program should look something like this (though the order of the output lines may vary):

% ./test
Startup is spawning 2 children
1:***** I am a child process
1:***** VNN: 1
1:***** Degree of program parallelism: 2
1:***** Total Nodes in system: 14
1:***** Doing Barrier
0:***** I am a child process
0:***** VNN: 0
0:***** Degree of program parallelism: 2
0:***** Total Nodes in system: 14
0:***** Child 0 is sleeping
0:***** Doing Barrier
1:***** Done with Barrier
0:***** Done with Barrier
% 


GLUnix Implementation Status

The following functionality is implemented in NOW-1:

Remote Execution: Jobs can be started on any node in the GLUnix cluster. A single job may spawn multiple worker processes on different nodes in the system.
Load Balancing: GLUnix maintains imprecise information on the load of each machine in the cluster. The system farms out jobs to the node which it considers least loaded at request time.
Signal Propagation: A signal sent to a process is multiplexed to all worker processes comprising the GLUnix process.
Coscheduling: Jobs spawned to multiple nodes can be gang scheduled to achieve better performance. The current coscheduling time quantum is 1 second.
IO Redirection: Output to stdout or stderr are piped back to the startup node. Characters sent to stdin are multiplexed to all worker processes. Output redirection is limited by network bandwidth.

Receiving Information

The mailing list glunix-announce@now.cs.berkeley.edu will carry reports of system down time, upgrades, etc. Another mailing list, glunix-users@now.cs.berkeley.edu, is a users discussion forum for new utility programs, helpful hints, etc.

To subscribe to either mailing list send email to "list-name"-request@now.cs.berkeley.edu with message body subscribe.


Reporting Problems

If you have any problems with GLUnix, please send email to glunix-bugs@now.cs.berkeley.edu. For problems of a more immediate nature, please contact either Douglas Ghormley or Amin Vahdat.