ISTORE: A Platform for Available, Maintainable, and Evolving Storage-Intensive Applications

The ISTORE project is researching techniques for eliminating the hassles of administering and maintaining large and evolving systems running data- and storage-intensive applications. To accomplish this goal, the project is targeting improvements in system availability, in the ability of systems to maintain themselves, and in the ability of systems to gracefully handle growth and evolution of their hardware and software components.

Research in the ISTORE group revolves around three efforts. The first is to develop a deeper understanding of the key metrics of availability, maintainability, and capacity for evolutionary growth. This is being accomplished by identifying measurable quantities that capture these metrics, as well as by developing a set of benchmarking techniques that are able to extract those quantities from real systems. Building on the fundamental technique of system perturbation via fault injection, we have had good success in the area of availability, and are generating promising initial results in the area of maintainability; we are currently working to refine and expand that initial work.

The second major effort of the ISTORE project is to identify techniques that can be used to improve availability, maintainability, and capacity for evolution, and to incorporate those techniques into design patterns and APIs for the construction of new systems. To this end, we are investigating a set of techniques that allow a system to both proactively and reactively adapt to failures and changes in its environment. On the proactive side, we are extending the usual techniques of state scrubbing with more invasive techniques of on-line fault injection and system perturbation. Our hypothesis is that these techniques can expose information about weaknesses and vulnerabilities in hardware and software structure that might otherwise not be visible. On the reactive side, we are investigating introspective techniques and software construction models that will enable services to monitor themselves and adjust their own behavior to run optimally despite changes and disruptions in their environment.

Finally, the third effort of the ISTORE project is to build a large-scale prototype to evaluate and demonstrate our research ideas. We are currently constructing the ISTORE-1 prototype, an 80-node server system composed of plug-and-play intelligent "disk bricks" each in a half-height disk form factor. Each "brick" contains a Pentium II processor, 256 MB of memory, four 100 Mb/s network interfaces, and a special diagnostic processor that enables fine-grained self-monitoring and fault-injection. The bricks are connected to an "intelligent chassis" that provides scalable redundant switching, power, and environmental monitoring. The ISTORE-1 prototype hardware is on schedule to be completed by the end of 2000. Initial software applications for the system will include a distributed e-mail service, a multi-tier e-commerce environment, and a distributed computer vision application. These applications are being built as an experimental platform for several of the system design techniques described above.