Questions to answer
- what factors affect the quality of service delivered by the system, and by how much/how long?
- how well can systems survive typical failure scenarios?
- traditionally, percentage of time system is up
- time-averaged, binary view of system state (up/down)
- traditional metric is too inflexible
- doesn’t capture spectrum of degraded states
- time-averaging discards important temporal behavior
- Solution: measure variation in system quality of service metrics over time
- performance, fault-tolerance, completeness, accuracy