Incidents

When something breaks, you get the whole story

An incident opens only when your regions agree a monitor is down. From there you get a full event timeline, one-click acknowledge and resolve, and a post-mortem you can publish.

Event-sourced timeline

Every state change is an event. The timeline replays open, acknowledge, and resolve with timestamps and the actor behind each.

Acknowledge and resolve

Ack to stop escalation. Resolve by hand, or let it auto-resolve once every region reports healthy again.

No false alarms

Consensus, maintenance windows, and dependency suppression keep blips and known work from paging you.

Publishable post-mortems

Write a markdown post-mortem on a resolved incident and publish it to your status page in one click.

Lifecycle

Opened on agreement.
Resolved on recovery.

Consensus decides when an incident is real. You decide when it is handled. Everything in between is on the timeline.

  1. Opens on consensus

    An incident opens only when enough regions agree the monitor is down, so a single network blip never pages you.

  2. Acknowledge it

    Ack from the dashboard or straight from Slack. Acknowledging stops the escalation clock immediately.

  3. Resolve it

    Resolve by hand, or let it auto-resolve once every region has reported healthy again for five minutes.

  4. Write the post-mortem

    Document what happened in markdown on the resolved incident, then publish it to your status page when you are ready.