Motivation: System Scaling
Infrastructure services are growing rapidly
- more users, more online data, higher access rates, more historical data
- bigger and bigger back-end systems are needed
- O(300)-node clusters deployed now; thousands of nodes not far off
- techniques for maintenance and administration must scale with the system to 1000s of nodes
Today’s administrative approaches don’t scale
- systems will be too big to reason about, monitor, or fix
- failures and load variance will be too frequent for static solutions to work
Introspective, reactive techniques are required