Incidents
Tracevium automatically detects outages and manages their full lifecycle — from detection to resolution — with a complete audit trail.
Automatic detection
An incident is opened automatically when an endpoint records consecutive failing checks. A single failure does not trigger an incident — this prevents noise from transient blips. Once the threshold is met, Tracevium:
- Opens a new incident linked to the affected endpoint.
- Records the exact start time of the outage.
- Fires notifications to all configured channels for the project.
- Reflects the degraded status on the public status page.
Incident lifecycle
Open — the endpoint is currently failing. Notifications have been sent. The incident is visible on the status page.
Acknowledged — a team member has confirmed they are aware and investigating. Escalation reminders are suppressed; status page still shows degraded.
Resolved — the endpoint returned to passing. Tracevium resolves the incident automatically, sends a recovery notification, and records total downtime.
Acknowledging an incident
Any member with at least developer access can acknowledge an incident. Acknowledging signals the team that someone is actively investigating, preventing duplicate response efforts.
Incident notes
Team members can add timestamped notes during the incident lifecycle — useful for documenting what is being investigated, sharing findings, and building an automatic post-mortem timeline. Notes are preserved after the incident resolves.
Any member with developer access or above can add notes.
Downtime tracking
Each incident records precise start and end times. Downtime is calculated from the first failing check to the first passing check after recovery. This feeds into the uptime percentage on the dashboard and status page.
Open incident counts
The dashboard shows a global open incident count across all projects. You can also query open counts via the API — useful for building internal dashboards or alerting integrations.
Recovery notifications
When an incident resolves, Tracevium sends a recovery notification through all configured channels so the team knows the outage is over without having to check the dashboard.