Analytics
Comprehensive incident analytics, performance metrics, and operational insights
Analytics
Analytics provide deep visibility into incident patterns, team performance, and operational health. Use these insights to identify trends, optimize response processes, and demonstrate SLA compliance.
Why Analytics Matter
| Without Analytics | With Analytics |
|---|---|
| Blind to patterns | Data-driven decisions |
| Can't prove SLA compliance | Objective SLA tracking |
| Unknown team workload | Fair on-call distribution |
| Reactive improvements | Proactive optimization |
Accessing Analytics
Navigate to Analytics in the sidebar to access:
- Dashboard: Overview of key metrics
- Incidents: Detailed incident analysis
- On-Call: Responder performance metrics
- SLA: Compliance tracking
- Reports: Exportable summaries
Key Metrics
OpsKnight tracks 19+ metrics across incidents, response times, and SLA compliance.
Response Time Metrics
MTTA (Mean Time to Acknowledge)
Average time from incident creation to first acknowledgment.
| Rating | MTTA | Interpretation |
|---|---|---|
| 🟢 Excellent | < 5 minutes | Rapid response |
| 🟡 Acceptable | 5–15 minutes | Room for improvement |
| 🔴 Needs Work | > 15 minutes | Review escalation policies |
Calculation: Sum of (acknowledgment time - creation time) / acknowledged incidents
MTTR (Mean Time to Resolve)
Average time from incident creation to resolution.
| Rating | MTTR | Interpretation |
|---|---|---|
| 🟢 Excellent | < 30 minutes | Efficient resolution |
| 🟡 Acceptable | 30–60 minutes | Typical for complex issues |
| 🔴 Needs Work | > 60 minutes | Review processes |
Calculation: Sum of (resolution time - creation time) / resolved incidents
MTTE (Mean Time to Escalate)
Average time before incidents escalate beyond the first responder.
| Rating | MTTE | Interpretation |
|---|---|---|
| 🟢 Healthy | > 15 minutes | Primary responders handling most issues |
| 🟡 Monitor | 5–15 minutes | Moderate escalation rate |
| 🔴 Review | < 5 minutes | Check if first-level needs more training/tools |
Volume Metrics
| Metric | Description |
|---|---|
| Total Incidents | Count of incidents in the period |
| Open Incidents | Currently unresolved incidents |
| Acknowledged | Incidents acknowledged but not resolved |
| Resolved | Incidents marked resolved |
| Snoozed | Incidents temporarily muted |
| Suppressed | Incidents auto-suppressed by rules |
Distribution Metrics
| Metric | Description |
|---|---|
| By Urgency | Breakdown by HIGH/MEDIUM/LOW |
| By Service | Incidents per service |
| By Team | Incidents per owning team |
| By Source | Incidents per integration |
| By Hour | Hourly distribution pattern |
| By Day | Daily distribution pattern |
SLA Metrics
SLA Compliance Rate
Percentage of incidents resolved within SLA targets.
SLA Compliance = (Incidents within SLA / Total Resolved Incidents) × 100
Per-Urgency SLA Tracking
| Urgency | Default Target | Metric |
|---|---|---|
| HIGH | 15 minutes to acknowledge, 1 hour to resolve | Compliance % |
| MEDIUM | 30 minutes to acknowledge, 4 hours to resolve | Compliance % |
| LOW | 2 hours to acknowledge, 24 hours to resolve | Compliance % |
SLA Breach Analysis
Track which services and teams breach SLA most frequently:
| Analysis | Purpose |
|---|---|
| Breaches by Service | Identify problematic services |
| Breaches by Team | Spot training/staffing needs |
| Breach Trends | Monitor improvement over time |
| Near-Miss Tracking | Incidents close to breaching |
Dashboard Visualizations
OpsKnight provides 15+ visualization types for incident data.
Time-Series Charts
| Chart | Shows |
|---|---|
| Incident Volume | Incidents over time (line/bar) |
| MTTA Trend | Acknowledgment time over time |
| MTTR Trend | Resolution time over time |
| SLA Compliance Trend | Compliance percentage over time |
Distribution Charts
| Chart | Shows |
|---|---|
| By Urgency | Pie/donut chart of urgency distribution |
| By Status | Current status breakdown |
| By Service | Top services by incident count |
| By Team | Team workload comparison |
| By Hour of Day | When incidents occur most |
| By Day of Week | Daily patterns |
Heatmaps
| Heatmap | Shows |
|---|---|
| Hour × Day | When incidents peak |
| Service × Urgency | Which services have critical issues |
| Team × Service | Coverage overlap |
Tables
| Table | Shows |
|---|---|
| Top Incidents | Longest-running or most impactful |
| Top Services | Services with most incidents |
| Top Responders | Most active incident handlers |
| Recent Breaches | SLA violations |
Filtering & Drill-Down
Available Filters
| Filter | Options |
|---|---|
| Time Range | Last 24h, 7d, 30d, 90d, custom range |
| Service | Single or multiple services |
| Team | Single or multiple teams |
| Assignee | Specific responders |
| Status | OPEN, ACKNOWLEDGED, RESOLVED, SNOOZED, SUPPRESSED |
| Urgency | HIGH, MEDIUM, LOW |
| Source | Integration that created the incident |
Quick Filters
| Quick Filter | Description |
|---|---|
| My Incidents | Incidents assigned to you |
| My Teams | Incidents for teams you belong to |
| SLA Breached | Only breached incidents |
| Unassigned | Incidents without an assignee |
Comparison Mode
Compare metrics across different periods:
This Week vs Last Week
This Month vs Last Month
This Quarter vs Last Quarter
Comparison shows:
- Percentage change
- Absolute difference
- Trend direction (↑ improving, ↓ declining)
Reports
Executive Summary Report
High-level overview for stakeholders.
Includes:
- Total incidents by urgency
- Overall SLA compliance
- MTTA and MTTR averages
- Top 5 affected services
- Period-over-period comparison
- Trend indicators
Schedule: Daily, weekly, or monthly delivery via email
On-Call Performance Report
Responder-focused metrics.
Includes:
| Metric | Description |
|---|---|
| Incidents Handled | Count per responder |
| Average Response Time | Per-responder MTTA |
| Escalation Rate | How often they escalate |
| Resolution Rate | Percentage resolved without escalation |
| Night/Weekend Load | Out-of-hours incident burden |
Service Health Report
Service-focused analysis.
Includes:
| Metric | Description |
|---|---|
| Incident Count | Total incidents per service |
| MTTR by Service | Resolution time per service |
| SLA Compliance | Per-service SLA tracking |
| Top Incident Types | Common issues per service |
| Uptime Percentage | Calculated availability |
Team Workload Report
Team capacity planning.
Includes:
| Metric | Description |
|---|---|
| Incidents per Member | Workload distribution |
| Cross-Team Escalations | Collaboration patterns |
| Peak Hours | When teams are busiest |
| Coverage Gaps | Periods without coverage |
Exporting Data
Export Formats
| Format | Use Case |
|---|---|
| CSV | Spreadsheet analysis, custom reporting |
| JSON | Integration with other tools |
| Stakeholder presentations |
Export Options
- Go to Analytics
- Apply desired filters
- Click Export
- Select format
- Choose data scope:
- Current View: Only visible data
- Full Dataset: All data matching filters
- Download file
Scheduled Exports
Set up recurring exports:
- Go to Analytics → Reports
- Click Schedule Report
- Configure:
- Report type
- Filters
- Format
- Recipients (email)
- Schedule (daily/weekly/monthly)
API Access
Analytics API Endpoints
| Endpoint | Description |
|---|---|
GET /api/analytics/summary |
Overview metrics |
GET /api/analytics/incidents |
Incident data with filters |
GET /api/analytics/mtta |
MTTA metrics |
GET /api/analytics/mttr |
MTTR metrics |
GET /api/analytics/sla |
SLA compliance data |
Query Parameters
| Parameter | Description | Example |
|---|---|---|
start_date |
Period start | 2024-01-01 |
end_date |
Period end | 2024-01-31 |
service_id |
Filter by service | svc_abc123 |
team_id |
Filter by team | team_xyz |
urgency |
Filter by urgency | HIGH |
group_by |
Aggregation key | service, team, day |
Example Request
curl -X GET "https://your-opsknight.com/api/analytics/summary?start_date=2024-01-01&end_date=2024-01-31" \
-H "Authorization: Bearer YOUR_API_TOKEN"
Example Response
{
"period": {
"start": "2024-01-01T00:00:00Z",
"end": "2024-01-31T23:59:59Z"
},
"metrics": {
"total_incidents": 156,
"mtta_minutes": 4.2,
"mttr_minutes": 28.5,
"sla_compliance": 94.2,
"by_urgency": {
"HIGH": 23,
"MEDIUM": 87,
"LOW": 46
},
"by_status": {
"RESOLVED": 148,
"OPEN": 5,
"ACKNOWLEDGED": 3
}
}
}
Dashboard Customization
Creating Custom Dashboards
- Go to Analytics → Dashboards
- Click Create Dashboard
- Add widgets:
- Drag from widget library
- Configure data source and filters
- Set visualization type
- Arrange layout
- Save dashboard
Widget Types
| Widget | Configuration |
|---|---|
| Metric Card | Single number with trend |
| Time Series | Line/bar chart over time |
| Distribution | Pie/donut/bar chart |
| Table | Sortable data grid |
| Heatmap | Two-dimensional pattern view |
| Text | Markdown annotations |
Sharing Dashboards
| Share Option | Description |
|---|---|
| Team Access | Visible to team members |
| Organization | Visible to all users |
| Public Link | Shareable URL (read-only) |
| Embed | iFrame embed code |
Alerting on Metrics
Set up alerts when metrics cross thresholds.
Alert Conditions
| Condition | Example |
|---|---|
| MTTA exceeds | Alert if MTTA > 10 minutes |
| MTTR exceeds | Alert if MTTR > 1 hour |
| SLA drops below | Alert if compliance < 95% |
| Volume spikes | Alert if incidents > 50/hour |
Alert Configuration
- Go to Analytics → Alerts
- Click Create Alert
- Configure:
- Metric to monitor
- Threshold value
- Evaluation window
- Notification channels
Best Practices
Regular Review Cadence
| Cadence | Focus |
|---|---|
| Daily | Open incidents, SLA breaches |
| Weekly | MTTA/MTTR trends, team workload |
| Monthly | Service health, pattern analysis |
| Quarterly | Capacity planning, process improvements |
Metric Interpretation
- Look at trends, not just snapshots
- Compare similar periods (weekday to weekday)
- Consider context (holidays, deployments)
- Investigate anomalies immediately
Action Items from Analytics
| Finding | Action |
|---|---|
| High MTTA | Review escalation policies, notification channels |
| High MTTR | Improve runbooks, add automation |
| Frequent breaches on service | Review service reliability |
| Uneven team workload | Adjust on-call rotations |
| Night/weekend spikes | Consider follow-the-sun coverage |
Reporting to Stakeholders
- Lead with business impact (not raw numbers)
- Show trends (improvement or decline)
- Include action items (what you're doing about it)
- Keep it brief (executive summary first)
Troubleshooting
Missing Data
- Check time range filter
- Verify service/team filters
- Confirm incidents exist in the period
- Check user permissions
Incorrect Metrics
- Verify incident timestamps are correct
- Check for manually backdated incidents
- Review suppressed/snoozed incident handling
- Confirm timezone settings
Slow Dashboard
- Reduce time range
- Add more specific filters
- Limit number of widgets
- Use cached data option
Related Topics
- Incidents — Incident lifecycle and management
- Services — Service configuration
- Teams — Team management
- SLA — SLA configuration
- Postmortems — Post-incident analysis
Last updated for v1
Edit this page on GitHub