Transient error handling
Transient errors are common in large arrays
- example: Berkeley 368-disk Tertiary Disk array, 11mo.
- 368 disks reported transient SCSI errors (100%)
- 13 disks reported transient hardware errors (3.5%)
- 2 disk failures (0.5%)
- isolated transients do not imply disk failures
- but streams of transients indicate failing disks
- both Tertiary Disk failures showed this behavior
Transient error handling policy is critical in long-term availability of array