System downtime is a common pain point for a great many organizations. Recently, a 12-hour store outage cost one Silicon Valley giant $25 million. And a 14-hour outage cost another corporation an estimated $90 million. News like this reminds companies that they need an incident management program before the unexpected happens. And there's no better time to get started than now.
But where do you begin? What elements does your program require? Who should be in charge?
In this practical book, seasoned professionals Michael Kehoe and David Cintz, who build, maintain, and run incident management programs for Confluent and LinkedIn, show you how to create a program that's effective, efficient, scalable, and automated--regardless of the size of your organization. You'll also learn how to tailor the program to meet the specific needs of your company as it continues to grow.
This book will help you:
- Understand the importance and benefits of an incident management program
- Create an effective incident categorization system and after-incident review program
- Create an effective automation strategy for incidents
- Build a comprehensive on-call program and train engineers
About the authors:
Michael Kehoe is an author, speaker, and senior staff security engineer at Confluent, leading security initiatives for multiple organizations.
David Cintz is an expert in leading large-scale incident response programs and serves as the staff technical program manager for security incidents at Confluent.