Goal of xFS

Central file servers are often performance, reliability, and availability bottlenecks in today's networks of workstations. The xFS file system is designed to distribute the work of central file server over the network of workstations to avoid central bottlenecks and to scale service with the NOW.

Eliminating the Central Server

How does xFS accomplish its goal of eliminating the central file server?
A typical file server has 4 main duties:

xFS uses a hash function to divide the name space of the file system. Each part of the name space is assigned to a different manager. XFS accomplishes distributed caching and cache coherence of data blocks through cooperative caching and distributed management. Clients may obtain copies of data blocks from other clients. The manager of the data block keeps track of who is currently storing the block and enforces cache coherence. This feature is shown in the demo. For example, if Client 5 caches a data block and Client 3 wants to read it, Client 3 will inquire the block's location from the block's manager who will forward the request to Client 5 who has the block stored in memory and forwards the block to Client 3. However, if Client 3 wants to write to the Block 1, the manager must in addition revoke Client 5's copy of the data so that cache consistency is maintained.

Disk storage of data is distributed across several machines called storage servers. Each client keeps an in-memory log of its dirty data. When the log is full or the data is sync'd to disk for reliability reasons, the data is divided into fragments and sent over the network to the storage servers. A parity fragment is also constructed and sent to another storage server to increase reliability and availity. You can see this feature in the demo by having one client create many blocks then hitting the flush button.

Performance

Small write performance

This graph shows how the file system scales with the number of clients actively writing small (1KB) files. xFS's performance is able to scale with many active clients since small writes are batched into large writes which are then distributed to disks across the network. In contrast, central file servers systems, like Network File System and Andrew File System, can only keep up with a few active clients before they become saturated with requests.

Figure 1: Small write performance

Large write performance

Figure 2 shows that xFS is able to scale well with large numbers of clients actively making large writes. Because xFS distributes its writes across several storage servers and different clients write to different storage server groups, it is able to provide the aggregate bandwidth of all the disks in the network of workstations rather than only the disks attached to the file server.

The graph for large read performance is identical.

Figure 2: Large write performance

Modified Andrew Benchmark

Figure 3 shows the results of running the Modifiew Andrew Benchmark on xFS, NFS, and AFS. The benchmark has several phases, 3 of which are shown in the graph. The write phase of the benchmark shows that xFS scales well with many clients writing, which is consistent with the results of the previous two benchmarks. All of the file systems scale well on the read phase of the benchmark, however, since the read phase only reads 4MB and each client has a much larger cache, this phase is only measuring how fast each file system can read from its local cache. The compile phase of the benchmark is cpu-bound rather than I/O bound. However, enough reading and writing of binaries and temporary files occurs to affect the performance of AFS and NFS. xFS also shows a decrease in performance in this phase as more clients are added, but for a different reason. Because the xFS clients are also serving as storage servers and managers, part of their cpu time is being used in this capacity rather than in compiling.

Figure 3: Modified Andrew Benchmark