Back to skills
SkillHub ClubRun DevOpsFull StackDevOps

site-reliability-engineer

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
26
Hot score
88
Updated
March 20, 2026
Overall rating
C1.9
Composite score
1.9
Best-practice grade
B81.2

Install command

npx @skill-hub/cli install nahisaho-musubi-site-reliability-engineer

Repository

nahisaho/MUSUBI

Skill path: src/templates/agents/claude-code/skills/site-reliability-engineer

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

Open repository

Best for

Primary workflow: Run DevOps.

Technical facets: Full Stack, DevOps.

Target audience: Development teams looking for install-ready agent workflows..

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: nahisaho.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install site-reliability-engineer into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/nahisaho/MUSUBI before adding site-reliability-engineer to shared team environments
  • Use site-reliability-engineer for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

site-reliability-engineer | SkillHub