Skip to content

Incidents

Track and manage incidents in your system.

What is an Incident?

An incident is any event that disrupts normal service operation. In Uptinio, incidents can be triggered by:

  • Monitors: When a monitored endpoint fails health checks
  • Servers: When server metrics exceed thresholds or the server becomes unreachable

Types of Incidents

Monitor Incidents

  • HTTP/HTTPS Failures: When a website or API endpoint is unreachable or returns error responses
  • SSL Certificate Issues: When SSL certificates are invalid or about to expire
  • Domain Expiry: When monitored domains are approaching expiration dates
  • Keyword Monitoring: When expected content is missing from responses
  • TCP Port Issues: When monitored ports become unreachable
  • Ping Failures: When basic connectivity checks fail

Server Incidents

  • Server Downtime: When the server becomes unreachable or stops sending metrics
  • Resource Thresholds:
    • CPU usage exceeds configured threshold
    • Memory (RAM) usage exceeds configured threshold
    • Disk space usage exceeds configured threshold
    • Network traffic (in/out) exceeds configured thresholds

Managing Incidents

Incident Lifecycle

  1. Detection: System automatically detects issues based on configured thresholds
  2. Creation: Incident is created with initial status and details
  3. Notification: Configured integrations are notified (email, Slack, webhooks)
  4. Investigation: Team investigates and updates incident status
  5. Resolution: Issue is resolved and incident is marked as resolved
  6. Post-mortem: Review incident details to identify areas for improvement and prevent similar issues in the future

Incident Statuses

  • Ongoing: Active incident being investigated
  • Resolved: Issue has been fixed

Best Practices

  • Configure appropriate alert thresholds to prevent alert fatigue
  • Set up multiple notification channels for critical incidents
  • Document incident response procedures
  • Review incident patterns to identify recurring issues