Support & Ops

Runbook Builder

Build printable incident runbooks. Trigger, checks, mitigation, escalation, rollback, acceptance.

DRAFTsvc:service
runbook.playbook/untitled.run
Incident Runbook · Operator Playbook
1c · 1m1 contact
0%
Steps complete
0 / 2
What woke you up

Trigger / alert

Verify before acting

Checks

verify1 step
1
$ step.1verify
Decision · branch

Did checks isolate the cause?

Yes → mitigate
Proceed to numbered mitigation steps below. Track in operator mode.
No → escalate
Page next on the escalation chain. Don't guess at fixes you can't verify.
Bring the service back

Mitigation

act1 step
2
$ step.2execute
How to undo — and how to confirm the undo worked

Rollback & verification

Verification checklist (Google SRE)
  • Error rate returns to baseline for ≥ 15 min
  • p95 / p99 latency within 110% of pre-incident
  • No new alerts firing during rollback window
  • Downstream dependencies report healthy
  • Customer-facing flows (critical user journeys) pass smoke tests
How we know it's fixed

Acceptance criteria

▾ escalation contacts · who to page, and when
·1