"What can anyone give you greater than now" - William Stafford
In the past 5 years, the cost performance gap between secondary and tertiary storage has been widening. The cost per megabyte of disk drives has been falling at a factor of 2 per year, compared to 1.5 per year for tape drives and libraries. Disk areal densities have been increasing at 60% per year, with 8 GB 3.5 inch disk units currently available. Data rates have also been increasing at rates of 40% per year, expected to pass 40 MB/s by the end of the decade. These trends change the possibilities in large scale storage systems. If they continue, large storage systems composed of disks will have significant cost/performance advantages over tape libraries of similar capacity.
Applications such as databases, video on demand, medical data and web archival have a need for storage systems which are high performance as well as high capacity. The solution used in most cases is a hierarchy of a disk array and tape library. However, disk arrays have drawbacks in terms of cost/performance, availability, and scalability. Due to custom hardware, the cost per megabyte of RAID disk arrays increases with system capacity, unlike raw disks and tape systems. Also, a disk array needs to be connected to a host computer, which becomes a bottleneck for both performance and availability. Its scalability is limited by the number of disks that can be supported by the infrastructure. Some storage consuming applications like web archival have a fixed growth rate of data. When such applications reach the capacity limit of their disk array, another array must be added. Adding independent disk arrays also lowers the reliability of the total system and complicates storage management.
Tertiary Disk is a storage system architecture which exploits the trends mentioned above to create large disk storage systems that avoid the disadvantages of custom built disk arrays. The name comes from twin goals: to have the cost per megabyte and capacity of tape libraries and the performance of magnetic disks. We use commodity, off the shelf components to develop a scalable, low cost, terabyte capacity disk system. Our target is to build a complete storage system with about 30-50% extra to the cost of the raw disk. Tertiary Disk uses PCs connected by a switched network to host a large number of disks. Our prototype consists of 20 200MHz PC PCs, which host 370 8GB disks. The PCs, running FreeBSD, are connected through a 100Mbps Ethernet switch.