Motivation
Human behavior can be the determining factor in system availability and reliability
- high percentage of outages caused by human error
- availability often affected by lack of maintenance, botched maintenance, poor configuration/tuning
- we’d like to build “touch-free” self-maintaining systems
Again, no tools exist to provide insight into what makes a system more maintainable
- our availability benchmarks purposely excluded the human factor
- benchmarks are a challenge due to human variability
- metrics are even sketchier here than for availability