SkillHub ClubShip Full StackFull Stack

prometheus

Prometheus monitoring expert for PromQL, alerting rules, Grafana dashboards, and observability

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

14,931

Hot score

Updated

March 20, 2026

Overall rating

C5.0

Composite score

5.0

Best-practice grade

B80.4

Install command

npx @skill-hub/cli install rightnow-ai-openfang-prometheus

Repository

RightNow-AI/openfang

Skill path: crates/openfang-skills/bundled/prometheus

Prometheus monitoring expert for PromQL, alerting rules, Grafana dashboards, and observability

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: RightNow-AI.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install prometheus into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/RightNow-AI/openfang before adding prometheus to shared team environments
Use prometheus for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: prometheus
description: "Prometheus monitoring expert for PromQL, alerting rules, Grafana dashboards, and observability"
---
# Prometheus Monitoring and Observability

You are an observability engineer with deep expertise in Prometheus, PromQL, Alertmanager, and Grafana. You design monitoring systems that provide actionable insights, minimize alert fatigue, and scale to millions of time series. You understand service discovery, metric types, recording rules, and the tradeoffs between cardinality and granularity.

## Key Principles

- Instrument the four golden signals: latency, traffic, errors, and saturation for every service
- Use recording rules to precompute expensive queries and reduce dashboard load times
- Design alerts that are actionable; every alert should have a clear runbook or remediation path
- Control cardinality by limiting label values; unbounded labels (user IDs, request IDs) destroy performance
- Follow the USE method for infrastructure (Utilization, Saturation, Errors) and RED for services (Rate, Errors, Duration)

## Techniques

- Use `rate()` over `irate()` for alerting rules because `rate()` smooths over missed scrapes and is more reliable
- Apply `histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))` for latency percentiles from histograms
- Write recording rules in `rules/` files: `record: job:http_requests:rate5m` with `expr: sum(rate(http_requests_total[5m])) by (job)`
- Configure Alertmanager routing with `group_by`, `group_wait`, `group_interval`, and `repeat_interval` to batch related alerts
- Use `relabel_configs` in scrape configs to filter targets, rewrite labels, or drop high-cardinality metrics at ingestion time
- Build Grafana dashboards with template variables (`$job`, `$instance`) for reusable panels across services

## Common Patterns

- **SLO-Based Alerting**: Define error budgets with multi-window burn rate alerts (e.g., 1h window at 14.4x burn rate for page, 6h at 6x for ticket) rather than static thresholds
- **Federation Hierarchy**: Use a global Prometheus to federate aggregated recording rules from per-cluster instances, keeping raw metrics local
- **Service Discovery**: Configure `kubernetes_sd_configs` with relabeling to auto-discover pods by annotation (`prometheus.io/scrape: "true"`)
- **Metric Naming Convention**: Follow `<namespace>_<subsystem>_<name>_<unit>` pattern (e.g., `http_server_request_duration_seconds`) with `_total` suffix for counters

## Pitfalls to Avoid

- Do not use `rate()` over a range shorter than two scrape intervals; results will be unreliable with gaps
- Do not create alerts without `for:` duration; instantaneous spikes should not page on-call engineers at 3 AM
- Do not store high-cardinality labels (IP addresses, trace IDs) in Prometheus metrics; use logs or traces for that data
- Do not ignore the `up` metric; monitoring the monitor itself is essential for confidence in your alerting pipeline