Software techniques (4)
Proactive introspection
- continuous online self-testing of HW and SW
- in deployed systems!
- goal is to shake out “Heisenbugs” before they’re encountered in normal operation
- needs data redundancy, node isolation, fault injection
- techniques:
- fault injection: triggering hardware and software error handling paths to verify their integrity/existence
- stress testing: push HW/SW to their limits
- scrubbing: periodic restoration of potentially “decaying” hardware or software state
- self-scrubbing data structures (like MVS)
- ECC scrubbing for disks and memory