Continuous Online Self-Testing
Self-maintaining systems should automatically carry out preventative maintenance
- need aggressive in situ component testing via
- fault injection: triggering hardware and software error handling paths to verify their integrity/existence
- stress testing: pushing HW/SW components past normal operating parameters
- scrubbing: periodic restoration of potentially “decaying” hardware or software state
ISTORE periodically isolates nodes from the system and performs extensive self-tests
- nodes can be easily isolated due to ISTORE’s built-in redundancy
- even in a deployed, running system