Appearance
Incidents
Track and manage incidents in your system.
What is an Incident?
An incident is any event that disrupts normal service operation. In Uptinio, incidents can be triggered by:
- Monitors: When a monitored endpoint fails health checks
- Servers: When server metrics exceed thresholds or the server becomes unreachable
Types of Incidents
Monitor Incidents
- HTTP/HTTPS Failures: When a website or API endpoint is unreachable or returns error responses
- SSL Certificate Issues: When SSL certificates are invalid or about to expire
- Domain Expiry: When monitored domains are approaching expiration dates
- Keyword Monitoring: When expected content is missing from responses
- TCP Port Issues: When monitored ports become unreachable
- Ping Failures: When basic connectivity checks fail
Server Incidents
- Server Downtime: When the server becomes unreachable or stops sending metrics
- Resource Thresholds:
- CPU usage exceeds configured threshold
- Memory (RAM) usage exceeds configured threshold
- Disk space usage exceeds configured threshold
- Network traffic (in/out) exceeds configured thresholds
Managing Incidents
Incident Lifecycle
- Detection: System automatically detects issues based on configured thresholds
- Creation: Incident is created with initial status and details
- Notification: Configured integrations are notified (email, Slack, webhooks)
- Investigation: Team investigates and updates incident status
- Resolution: Issue is resolved and incident is marked as resolved
- Post-mortem: Review incident details to identify areas for improvement and prevent similar issues in the future
Incident Statuses
- Ongoing: Active incident being investigated
- Resolved: Issue has been fixed
Best Practices
- Configure appropriate alert thresholds to prevent alert fatigue
- Set up multiple notification channels for critical incidents
- Document incident response procedures
- Review incident patterns to identify recurring issues