The Architectural Costs of Streaming I/O: A Comparison of Workstations, Clusters, and SMPs

We investigate resource usage while performing streaming I/O by contrasting three architectures, a single workstation, a cluster, and an SMP, under various I/O benchmarks. We derive analytical and empirically-based models of resource usage during data transfer, examining the I/O bus, memory bus, network, and processor of each system. By investigating each resource in detail, we assess what comprises a well-balanced system for these workloads.

We find that the architectures we study are not well balanced for streaming I/O applications. Across the platforms, the main limitation to attaining peak performance is the CPU, due to lack of data locality. Increasing processor performance (especially with improved block operation performance) will be of great aid for these workloads in the future. For a cluster workstation, the I/O bus is a major system bottleneck, because of the increased load placed on it from network communication. A well-balanced cluster workstation should have copious I/O bus bandwidth, perhaps via multiple I/O busses. The SMP suffers from poor memory-system performance; even when there is true parallelism in the benchmark, contention in the shared-memory system leads to reduced performance. As a result, the clustered workstations provide higher absolute performance for streaming I/O workloads.