Research Plan

My system administration research originally considered four different areas:

Managing Stable Storage

Managing Stable Storage: Consistency, Fault Tolerance, Scalability, Recoverability and Customization. Goal: High Availability, nearly identical filesystems while minimizing resource usage. Managing stable storage is one of the big problems faced by administrators. They have to make sure that the different filesystems seen by machines are consistent. Since machines won't work when important filesystems are unavailable, the filesystems need to be fault-tolerant. To handle large numbers of clients, the filesystems need to be scalable. To handle large software collections, different architectures, and different requirements for machines, the filesystem appearance has to be customized. Finally, the fileystem needs to be backed up to recover from both catastrophic failures and user errors.

Consistency

Physical distance, or frequent disconnections prevents strong consistency. Administrators have developed tools linke rdist, track, and opendist in order to handle these problems. Unfortunately, those systems have fairly poor consistency that is dependent on some command being executed at the same time as a machine is connected to a central repository. Similarily, the tools get progressively slower as the size of the central repository grows. However these tools are independent of the underlying filesystem

Local area filesystems provide a separate view which has stronger consistency, but requires fairly fast connections, and removes access when the machines are disconnected. NFS, AFS, Coda, and xFS show various approaches to the local area filesystem. AFS and Coda provide some support for wide area filesystems, and Coda provides support for disconnected operation.

Question: Can these two different approaches be merged so that the filesystem maintains varying levels of known consistency between groups of machines?

Fault Tolerance

Various classes of machines need to be able to handle different types of faults. Portable machines need to be able to handle a failure all remote nodes, similarily, wide-area or weakly connected machines may also need to be able to handle failure of all remote nodes. Different administrative domains inside a single organization may be unwilling to create cross dependancies between their sub-groups. Even in local area networks, failure of a central fileserver can cause everyone to be unable to get work done.

Optimally, this fault tolerance would be imbedded into the filesystem so that it is transparent to users. Conventionally, techiniques like rdist and track are used to handle wide-area or portable machines, and to handle a subset of important files, and then a central server is used to store shared, frequently updated data. A good filesystem would support both fault tolerance in the local area (so that some of the file servers can be lost without losing service, ala RAID disk systems), and the filesystem would support weak consistency and complete fault-tolerance for portable, wide area, and different administrative domain machines.

The simple questions is: is this feasable?

Scalability

Sites are getting much larger as computer use continues to expand in the workplace. As a result, file service already needs to be faster and larger. Moreover, the difference between large central servers and clients is diminishings, so the clients are putting a larger relative load on the servers. For both of these reasons, centralized fileservers are becoming less effective solution. A potentially better approach is to take advantage of the power of the clients to assist the filesystem. For read-requests, systems like CacheFS are starting to help in this direction. We believe that the file service needs to also more scalability in both read and write performance.

If the shared filesystem capacity and capability scales with the number of clients, then we gain additional benefits. We can keep every file (including the root filesystem for each client) on the shared filesystem. This allows very easy upgrades of clients because the filesystem is globally available. Further, upgrades can be applied even if the client is down. FInally in the case of permanant client failure, it is easy to rbing up another replacement machine since all of the client's files are on the shared filesystem.

Recoverability

There are two general calasses of recoverability. First, recoverability from user errors. Users can lose files because of accidental file deletion, or because they want to return to a previous version. Second, recoverabliity from catastrophic failures. For redundant filesystems (RAIDs), this can occur due to multiple failures. Similarily, natural disasters can require recovereing rom off-site backups.

Previous systems (AFS, Plan 9) have demonstrated the advantage of getting to previous versions. AFS kept a special volume (.backup) which contained a snapshot of each volume at the time of a previous backup. Plan 9 through the use of special hardware (a WORM jukebox) provided time travel to a arbitrary point in the past. We believe that it would be useful to have arbitrary, user-specified snapshots (as well as periodic system snapshots), but that this support needs to be provided without the use of specialized hardware.

The second large class of recoverability is from catastrophic failures. For this to be feasable, the filesystem needs to be able to write out at least one (and preferably 2) copies of the filesystem to tape for off-site storage. It is possible that if the cost of disk continues to fall faster than the cost of tape, it may be more reasonable to mirror files to off-site disks. Regardless, we believe that there are still significant challenges in supporting a filesystem which can backup large files (many GB) as well as large filesystems (many TB) without falling back to substantial administrative work for partitioning up the filesystem space.

Customization

Often the view of the filesystem needs to vary between systems. First, binaries compiled for one architecture do not work on others, but for consistency, the files should all appear in the same place. Second, some systems may have special hardware or roles that require them to have additional or different files. Third, large software collections require the ability to install and uninstall programs as well as manage conflicts between programs. Fourth, some users may need different versions of the same program at the same time to work around bugs.

Clearly cuztomization at the level of machines is required. We believe that customization per user/per process may also be beneficial. Users could access different versions of programs on the same machine, processes could configure variant version of the filesystem (ala chroot for security). However this also introduces a potential nightmare for the administrator as they can no longer easily tell how the filesystem is appearing to a particular program, and the number of potential configurations may become very large.

Flexible customization poses a number of problems for the implementation. One approach would be above the filesystem and has been tried through depot and its varients. This approach uses symlinks to build the customized appearance from some repository. Some varients of depot support runnign programs after the customized filesystem is built to create indicies of of separate parts (man page indicies, configuration indicies, font lists, etc.) This support is necessary for a fully general solution. However, symlinks can be detected causing some programs to behave incorrectly. Moreover, getting programs to install into non-conflicting places in the first place can be challenging. For these reasons, a solution embedded into the filesystem looks appealing. However, supporting index creation inside the filesystem may be challenging. A hybrid approach may turn out to be best.

Monitoring and Diagnosing a Running System

Monitoring and Diagnosing a Running System: Discovering and Fixing Performance and Correctness Problems, Planning for Long Term Trends. One of the traditional tasks for system administrators is maintaining a running system. This means they need to monitor, diagnose, and fix problems that occur in the system. We believe this includes both instantaneous faults (crashes, overloads, etc.) and long term performance problems (capacity planning). To make the administrators job easier, problems that can be automatically and safely fixed should be, and the administrator should just be notified that the problem was handled. This area of research involves the following sub-areas:

Gathering & Storing Data

There are many approaches to gathering data. SNMP provides an interface to gathering some types of data, tcpdump allows gathering of network information, and various programs (vmstat, iostat, w) can be run to get system statistics. All of these approaches fail to provide easy extensibility, or easy access to historical information. Furthermore, they fail to provide easy, powerful methods for accessing the data. For these reasons, we turn to database research to try and help solve the data management problem. However, for gathering the initial information to put into a database, we expect to utilize all of those approaches.

We believe that for fault tolerance, each node should run its own copy of the database from local disk and storing to local disk. This database will only store the information for that node, but this approach helps guarentee information will not be lost. Then the information needs to be collected together toward a single repository in order to make analysis and storage easier. The aggregation may happen in a tree for efficiency or to deal with geographic concerns. The aggregation will also have to happen in a fault-tolerant manner.

We would like the database itself to be fault tolerant, however, given the type of updates, and the extreme fault-tolerance requirements, it may be simpler to build the fault-tolerance on top of the database. Similarily, we would like the database to automatically update data via the various gathering mechanisms, however again we believe we can build that on top. The primary area of concern is scalability. With hundreds to thousands of hosts updating the database continuosly, the database could easily see thousands of updates per second. We would like ot avoid having to purchase a few expensive machines solely to monitor the system.

Data Visualization

If there is only a single machine, then visualization is a straightforward problem. THe various metrics for the machine are displayed on the screen for the administrator to see. However, as the number of machines grows, this approach is no longer viable. Under traditional designs, either a very little bit of information is displayed about a lot of sources, e.g. up/down status for all the routers in a system. Alternately, the other machines can be ignored, and a lot of information can be displayed about a few machines. The problem is that the screen real-estate is fixed. Therefoer, displaying more information about more machines requires increasing the information/pixel ratio. We propose to achieve this by use of aggregation and by taking advantage of the high resolution color displays that are available.

Aggregation compresses metrics of multiple systems into a single metric. Typical examples include average, max, and median. Unfortunately, aggregation loses information about the spread of the data. Since each of the aggregation methods has an associated measure of spread, we plan to display the aggregate value as well as the spread of the data using two of the axis for display information on the screen. The related spread metrics are standard deviation, range and SIQR respectively.

High resolution color screens provide a number of axis for display. The first is pixels turned on or off in some area. We call this fill. The second three axis correspond to the values in the HSV color model. They are hue (the actual color), shade (saturation), and tint (whiteness). We do not believe that you have to use all of the axis at the same time, but we believe that all of the axis should be available for use as humans can see over 100,000 different colors.

Simplifying Security

Simplifying Security: Raising the Level of Abstraction to Simplify Security Programming. Using Existing Infrastructure to Aid Acceptance.

Supporting Users

Supporting Users: Automatic Help Desks, Remote Device Access and Training
Last Update: 28 Nov 96; Contact Eric Anderson with questions or comments