SkillHub ClubShip Full StackFull Stack

performance-profiler

Performance Profiler

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

5,843

Hot score

Updated

March 20, 2026

Overall rating

C4.0

Composite score

4.0

Best-practice grade

C60.3

Install command

npx @skill-hub/cli install alirezarezvani-claude-skills-performance-profiler

Repository

alirezarezvani/claude-skills

Skill path: engineering/performance-profiler

Performance Profiler

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: alirezarezvani.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install performance-profiler into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/alirezarezvani/claude-skills before adding performance-profiler to shared team environments
Use performance-profiler for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: "performance-profiler"
description: "Performance Profiler"
---

# Performance Profiler

**Tier:** POWERFUL  
**Category:** Engineering  
**Domain:** Performance Engineering  

---

## Overview

Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU, memory, and I/O bottlenecks; generates flamegraphs; analyzes bundle sizes; optimizes database queries; detects memory leaks; and runs load tests with k6 and Artillery. Always measures before and after.

## Core Capabilities

- **CPU profiling** — flamegraphs for Node.js, py-spy for Python, pprof for Go
- **Memory profiling** — heap snapshots, leak detection, GC pressure
- **Bundle analysis** — webpack-bundle-analyzer, Next.js bundle analyzer
- **Database optimization** — EXPLAIN ANALYZE, slow query log, N+1 detection
- **Load testing** — k6 scripts, Artillery scenarios, ramp-up patterns
- **Before/after measurement** — establish baseline, profile, optimize, verify

---

## When to Use

- App is slow and you don't know where the bottleneck is
- P99 latency exceeds SLA before a release
- Memory usage grows over time (suspected leak)
- Bundle size increased after adding dependencies
- Preparing for a traffic spike (load test before launch)
- Database queries taking >100ms

---

## Quick Start

```bash
# Analyze a project for performance risk indicators
python3 scripts/performance_profiler.py /path/to/project

# JSON output for CI integration
python3 scripts/performance_profiler.py /path/to/project --json

# Custom large-file threshold
python3 scripts/performance_profiler.py /path/to/project --large-file-threshold-kb 256
```

---

## Golden Rule: Measure First

```bash
# Establish baseline BEFORE any optimization
# Record: P50, P95, P99 latency | RPS | error rate | memory usage

# Wrong: "I think the N+1 query is slow, let me fix it"
# Right: Profile → confirm bottleneck → fix → measure again → verify improvement
```

---

## Node.js Profiling
→ See references/profiling-recipes.md for details

## Before/After Measurement Template

```markdown
## Performance Optimization: [What You Fixed]

**Date:** 2026-03-01  
**Engineer:** @username  
**Ticket:** PROJ-123  

### Problem
[1-2 sentences: what was slow, how was it observed]

### Root Cause
[What the profiler revealed]

### Baseline (Before)
| Metric | Value |
|--------|-------|
| P50 latency | 480ms |
| P95 latency | 1,240ms |
| P99 latency | 3,100ms |
| RPS @ 50 VUs | 42 |
| Error rate | 0.8% |
| DB queries/req | 23 (N+1) |

Profiler evidence: [link to flamegraph or screenshot]

### Fix Applied
[What changed — code diff or description]

### After
| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| P50 latency | 480ms | 48ms | -90% |
| P95 latency | 1,240ms | 120ms | -90% |
| P99 latency | 3,100ms | 280ms | -91% |
| RPS @ 50 VUs | 42 | 380 | +804% |
| Error rate | 0.8% | 0% | -100% |
| DB queries/req | 23 | 1 | -96% |

### Verification
Load test run: [link to k6 output]
```

---

## Optimization Checklist

### Quick wins (check these first)

```
Database
□ Missing indexes on WHERE/ORDER BY columns
□ N+1 queries (check query count per request)
□ Loading all columns when only 2-3 needed (SELECT *)
□ No LIMIT on unbounded queries
□ Missing connection pool (creating new connection per request)

Node.js
□ Sync I/O (fs.readFileSync) in hot path
□ JSON.parse/stringify of large objects in hot loop
□ Missing caching for expensive computations
□ No compression (gzip/brotli) on responses
□ Dependencies loaded in request handler (move to module level)

Bundle
□ Moment.js → dayjs/date-fns
□ Lodash (full) → lodash/function imports
□ Static imports of heavy components → dynamic imports
□ Images not optimized / not using next/image
□ No code splitting on routes

API
□ No pagination on list endpoints
□ No response caching (Cache-Control headers)
□ Serial awaits that could be parallel (Promise.all)
□ Fetching related data in a loop instead of JOIN
```

---

## Common Pitfalls

- **Optimizing without measuring** — you'll optimize the wrong thing
- **Testing in development** — profile against production-like data volumes
- **Ignoring P99** — P50 can look fine while P99 is catastrophic
- **Premature optimization** — fix correctness first, then performance
- **Not re-measuring** — always verify the fix actually improved things
- **Load testing production** — use staging with production-size data

---

## Best Practices

1. **Baseline first, always** — record metrics before touching anything
2. **One change at a time** — isolate the variable to confirm causation
3. **Profile with realistic data** — 10 rows in dev, millions in prod — different bottlenecks
4. **Set performance budgets** — `p(95) < 200ms` in CI thresholds with k6
5. **Monitor continuously** — add Datadog/Prometheus metrics for key paths
6. **Cache invalidation strategy** — cache aggressively, invalidate precisely
7. **Document the win** — before/after in the PR description motivates the team


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/performance_profiler.py

```python
#!/usr/bin/env python3
"""Lightweight repo performance profiling helper (stdlib only)."""

from __future__ import annotations

import argparse
import json
import os
from pathlib import Path
from typing import Dict, Iterable, List, Tuple

EXT_WEIGHTS = {
    ".js": 1.0,
    ".jsx": 1.0,
    ".ts": 1.0,
    ".tsx": 1.0,
    ".css": 0.7,
    ".map": 2.0,
}


def iter_files(root: Path) -> Iterable[Path]:
    for dirpath, dirnames, filenames in os.walk(root):
        dirnames[:] = [d for d in dirnames if d not in {".git", "node_modules", ".next", "dist", "build", "coverage", "__pycache__"}]
        for filename in filenames:
            path = Path(dirpath) / filename
            if path.is_file():
                yield path


def get_large_files(root: Path, threshold_bytes: int) -> List[Tuple[str, int]]:
    large: List[Tuple[str, int]] = []
    for file_path in iter_files(root):
        size = file_path.stat().st_size
        if size >= threshold_bytes:
            large.append((str(file_path.relative_to(root)), size))
    return sorted(large, key=lambda item: item[1], reverse=True)


def count_dependencies(root: Path) -> Dict[str, int]:
    counts = {"node_dependencies": 0, "python_dependencies": 0, "go_dependencies": 0}

    package_json = root / "package.json"
    if package_json.exists():
        try:
            data = json.loads(package_json.read_text(encoding="utf-8"))
            deps = data.get("dependencies", {})
            dev_deps = data.get("devDependencies", {})
            counts["node_dependencies"] = len(deps) + len(dev_deps)
        except Exception:
            pass

    requirements = root / "requirements.txt"
    if requirements.exists():
        lines = [ln.strip() for ln in requirements.read_text(encoding="utf-8", errors="ignore").splitlines()]
        counts["python_dependencies"] = sum(1 for ln in lines if ln and not ln.startswith("#"))

    go_mod = root / "go.mod"
    if go_mod.exists():
        lines = go_mod.read_text(encoding="utf-8", errors="ignore").splitlines()
        in_require_block = False
        go_count = 0
        for ln in lines:
            s = ln.strip()
            if s.startswith("require ("):
                in_require_block = True
                continue
            if in_require_block and s == ")":
                in_require_block = False
                continue
            if in_require_block and s and not s.startswith("//"):
                go_count += 1
            elif s.startswith("require ") and not s.endswith("("):
                go_count += 1
        counts["go_dependencies"] = go_count

    return counts


def bundle_indicators(root: Path) -> Dict[str, object]:
    indicators = {
        "build_dirs_present": [],
        "bundle_like_files": 0,
        "estimated_bundle_weight": 0.0,
    }
    for d in ["dist", "build", ".next", "out"]:
        if (root / d).exists():
            indicators["build_dirs_present"].append(d)

    bundle_files = 0
    weight = 0.0
    for path in iter_files(root):
        ext = path.suffix.lower()
        if ext in EXT_WEIGHTS:
            bundle_files += 1
            size_kb = path.stat().st_size / 1024.0
            weight += size_kb * EXT_WEIGHTS[ext]

    indicators["bundle_like_files"] = bundle_files
    indicators["estimated_bundle_weight"] = round(weight, 2)
    return indicators


def format_size(num_bytes: int) -> str:
    units = ["B", "KB", "MB", "GB"]
    value = float(num_bytes)
    for unit in units:
        if value < 1024.0 or unit == units[-1]:
            return f"{value:.1f}{unit}"
        value /= 1024.0
    return f"{num_bytes}B"


def build_report(root: Path, threshold_bytes: int) -> Dict[str, object]:
    large = get_large_files(root, threshold_bytes)
    deps = count_dependencies(root)
    bundles = bundle_indicators(root)
    return {
        "root": str(root),
        "large_file_threshold_bytes": threshold_bytes,
        "large_files": large,
        "dependency_counts": deps,
        "bundle_indicators": bundles,
    }


def print_text(report: Dict[str, object]) -> None:
    print("Performance Profile Report")
    print(f"Root: {report['root']}")
    print(f"Large-file threshold: {format_size(int(report['large_file_threshold_bytes']))}")
    print("")

    dep_counts = report["dependency_counts"]
    print("Dependency Counts")
    print(f"- Node: {dep_counts['node_dependencies']}")
    print(f"- Python: {dep_counts['python_dependencies']}")
    print(f"- Go: {dep_counts['go_dependencies']}")
    print("")

    bundle = report["bundle_indicators"]
    print("Bundle Indicators")
    print(f"- Build directories present: {', '.join(bundle['build_dirs_present']) or 'none'}")
    print(f"- Bundle-like files: {bundle['bundle_like_files']}")
    print(f"- Estimated weighted bundle size: {bundle['estimated_bundle_weight']} KB")
    print("")

    print("Large Files")
    large_files = report["large_files"]
    if not large_files:
        print("- None above threshold")
    else:
        for rel_path, size in large_files[:20]:
            print(f"- {rel_path}: {format_size(size)}")


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Analyze a project directory for common performance risk indicators."
    )
    parser.add_argument("path", help="Directory to analyze")
    parser.add_argument(
        "--large-file-threshold-kb",
        type=int,
        default=512,
        help="Threshold in KB for reporting large files (default: 512)",
    )
    parser.add_argument(
        "--json",
        action="store_true",
        help="Print JSON output instead of text",
    )
    return parser.parse_args()


def main() -> int:
    args = parse_args()
    root = Path(args.path).expanduser().resolve()
    if not root.exists() or not root.is_dir():
        raise SystemExit(f"Path is not a directory: {root}")

    threshold = max(1, args.large_file_threshold_kb) * 1024
    report = build_report(root, threshold)

    if args.json:
        print(json.dumps(report, indent=2))
    else:
        print_text(report)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### references/profiling-recipes.md

```markdown
# performance-profiler reference

## Node.js Profiling

### CPU Flamegraph

```bash
# Method 1: clinic.js (best for development)
npm install -g clinic

# CPU flamegraph
clinic flame -- node dist/server.js

# Heap profiler
clinic heapprofiler -- node dist/server.js

# Bubble chart (event loop blocking)
clinic bubbles -- node dist/server.js

# Load with autocannon while profiling
autocannon -c 50 -d 30 http://localhost:3000/api/tasks &
clinic flame -- node dist/server.js
```

```bash
# Method 2: Node.js built-in profiler
node --prof dist/server.js
# After running some load:
node --prof-process isolate-*.log | head -100
```

```bash
# Method 3: V8 CPU profiler via inspector
node --inspect dist/server.js
# Open Chrome DevTools → Performance → Record
```

### Heap Snapshot / Memory Leak Detection

```javascript
// Add to your server for on-demand heap snapshots
import v8 from 'v8'
import fs from 'fs'

// Endpoint: POST /debug/heap-snapshot (protect with auth!)
app.post('/debug/heap-snapshot', (req, res) => {
  const filename = `heap-${Date.now()}.heapsnapshot`
  const snapshot = v8.writeHeapSnapshot(filename)
  res.json({ snapshot })
})
```

```bash
# Take snapshots over time and compare in Chrome DevTools
curl -X POST http://localhost:3000/debug/heap-snapshot
# Wait 5 minutes of load
curl -X POST http://localhost:3000/debug/heap-snapshot
# Open both snapshots in Chrome → Memory → Compare
```

### Detect Event Loop Blocking

```javascript
// Add blocked-at to detect synchronous blocking
import blocked from 'blocked-at'

blocked((time, stack) => {
  console.warn(`Event loop blocked for ${time}ms`)
  console.warn(stack.join('\n'))
}, { threshold: 100 }) // Alert if blocked > 100ms
```

### Node.js Memory Profiling Script

```javascript
// scripts/memory-profile.mjs
// Run: node --experimental-vm-modules scripts/memory-profile.mjs

import { createRequire } from 'module'
const require = createRequire(import.meta.url)

function formatBytes(bytes) {
  return (bytes / 1024 / 1024).toFixed(2) + ' MB'
}

function measureMemory(label) {
  const mem = process.memoryUsage()
  console.log(`\n[${label}]`)
  console.log(`  RSS:       ${formatBytes(mem.rss)}`)
  console.log(`  Heap Used: ${formatBytes(mem.heapUsed)}`)
  console.log(`  Heap Total:${formatBytes(mem.heapTotal)}`)
  console.log(`  External:  ${formatBytes(mem.external)}`)
  return mem
}

const baseline = measureMemory('Baseline')

// Simulate your operation
for (let i = 0; i < 1000; i++) {
  // Replace with your actual operation
  const result = await someOperation()
}

const after = measureMemory('After 1000 operations')

console.log(`\n[Delta]`)
console.log(`  Heap Used: +${formatBytes(after.heapUsed - baseline.heapUsed)}`)

// If heap keeps growing across GC cycles, you have a leak
global.gc?.() // Run with --expose-gc flag
const afterGC = measureMemory('After GC')
if (afterGC.heapUsed > baseline.heapUsed * 1.1) {
  console.warn('⚠️  Possible memory leak detected (>10% growth after GC)')
}
```

---

## Python Profiling

### CPU Profiling with py-spy

```bash
# Install
pip install py-spy

# Profile a running process (no code changes needed)
py-spy top --pid $(pgrep -f "uvicorn")

# Generate flamegraph SVG
py-spy record -o flamegraph.svg --pid $(pgrep -f "uvicorn") --duration 30

# Profile from the start
py-spy record -o flamegraph.svg -- python -m uvicorn app.main:app

# Open flamegraph.svg in browser — look for wide bars = hot code paths
```

### cProfile for function-level profiling

```python
# scripts/profile_endpoint.py
import cProfile
import pstats
import io
from app.services.task_service import TaskService

def run():
    service = TaskService()
    for _ in range(100):
        service.list_tasks(user_id="user_1", page=1, limit=20)

profiler = cProfile.Profile()
profiler.enable()
run()
profiler.disable()

# Print top 20 functions by cumulative time
stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats('cumulative')
stats.print_stats(20)
print(stream.getvalue())
```

### Memory profiling with memory_profiler

```python
# pip install memory-profiler
from memory_profiler import profile

@profile
def my_function():
    # Function to profile
    data = load_large_dataset()
    result = process(data)
    return result
```

```bash
# Run with line-by-line memory tracking
python -m memory_profiler scripts/profile_function.py

# Output:
# Line #    Mem usage    Increment   Line Contents
# ================================================
#     10   45.3 MiB   45.3 MiB   def my_function():
#     11   78.1 MiB   32.8 MiB       data = load_large_dataset()
#     12  156.2 MiB   78.1 MiB       result = process(data)
```

---

## Go Profiling with pprof

```go
// main.go — add pprof endpoints
import _ "net/http/pprof"
import "net/http"

func main() {
    // pprof endpoints at /debug/pprof/
    go func() {
        log.Println(http.ListenAndServe(":6060", nil))
    }()
    // ... rest of your app
}
```

```bash
# CPU profile (30s)
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

# Memory profile
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap

# Goroutine leak detection
curl http://localhost:6060/debug/pprof/goroutine?debug=1

# In pprof UI: "Flame Graph" view → find the tallest bars
```

---

## Bundle Size Analysis

### Next.js Bundle Analyzer

```bash
# Install
pnpm add -D @next/bundle-analyzer

# next.config.js
const withBundleAnalyzer = require('@next/bundle-analyzer')({
  enabled: process.env.ANALYZE === 'true',
})
module.exports = withBundleAnalyzer({})

# Run analyzer
ANALYZE=true pnpm build
# Opens browser with treemap of bundle
```

### What to look for

```bash
# Find the largest chunks
pnpm build 2>&1 | grep -E "^\s+(λ|○|●)" | sort -k4 -rh | head -20

# Check if a specific package is too large
# Visit: https://bundlephobia.com/package/[email protected]
# moment: 67.9kB gzipped → replace with date-fns (13.8kB) or dayjs (6.9kB)

# Find duplicate packages
pnpm dedupe --check

# Visualize what's in a chunk
npx source-map-explorer .next/static/chunks/*.js
```

### Common bundle wins

```typescript
// Before: import entire lodash
import _ from 'lodash'  // 71kB

// After: import only what you need
import debounce from 'lodash/debounce'  // 2kB

// Before: moment.js
import moment from 'moment'  // 67kB

// After: dayjs
import dayjs from 'dayjs'  // 7kB

// Before: static import (always in bundle)
import HeavyChart from '@/components/HeavyChart'

// After: dynamic import (loaded on demand)
const HeavyChart = dynamic(() => import('@/components/HeavyChart'), {
  loading: () => <Skeleton />,
})
```

---

## Database Query Optimization

### Find slow queries

```sql
-- PostgreSQL: enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 20 slowest queries
SELECT
  round(mean_exec_time::numeric, 2) AS mean_ms,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  left(query, 80) AS query
FROM pg_stat_statements
WHERE calls > 10
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Reset stats
SELECT pg_stat_statements_reset();
```

```bash
# MySQL slow query log
mysql -e "SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 0.1;"
tail -f /var/log/mysql/slow-query.log
```

### EXPLAIN ANALYZE

```sql
-- Always use EXPLAIN (ANALYZE, BUFFERS) for real timing
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT t.*, u.name as assignee_name
FROM tasks t
LEFT JOIN users u ON u.id = t.assignee_id
WHERE t.project_id = 'proj_123'
  AND t.deleted_at IS NULL
ORDER BY t.created_at DESC
LIMIT 20;

-- Look for:
-- Seq Scan on large table → needs index
-- Nested Loop with high rows → N+1, consider JOIN or batch
-- Sort → can index handle the sort?
-- Hash Join → fine for moderate sizes
```

### Detect N+1 Queries

```typescript
// Add query logging in dev
import { db } from './client'

// Drizzle: enable logging
const db = drizzle(pool, { logger: true })

// Or use a query counter middleware
let queryCount = 0
db.$on('query', () => queryCount++)

// In tests:
queryCount = 0
const tasks = await getTasksWithAssignees(projectId)
expect(queryCount).toBe(1)  // Fail if it's 21 (1 + 20 N+1s)
```

```python
# Django: detect N+1 with django-silk or nplusone
from nplusone.ext.django.middleware import NPlusOneMiddleware
MIDDLEWARE = ['nplusone.ext.django.middleware.NPlusOneMiddleware']
NPLUSONE_RAISE = True  # Raise exception on N+1 in tests
```

### Fix N+1 — Before/After

```typescript
// Before: N+1 (1 query for tasks + N queries for assignees)
const tasks = await db.select().from(tasksTable)
for (const task of tasks) {
  task.assignee = await db.select().from(usersTable)
    .where(eq(usersTable.id, task.assigneeId))
    .then(r => r[0])
}

// After: 1 query with JOIN
const tasks = await db
  .select({
    id: tasksTable.id,
    title: tasksTable.title,
    assigneeName: usersTable.name,
    assigneeEmail: usersTable.email,
  })
  .from(tasksTable)
  .leftJoin(usersTable, eq(usersTable.id, tasksTable.assigneeId))
  .where(eq(tasksTable.projectId, projectId))
```

---

## Load Testing with k6

```javascript
// tests/load/api-load-test.js
import http from 'k6/http'
import { check, sleep } from 'k6'
import { Rate, Trend } from 'k6/metrics'

const errorRate = new Rate('errors')
const taskListDuration = new Trend('task_list_duration')

export const options = {
  stages: [
    { duration: '30s', target: 10 },   // Ramp up to 10 VUs
    { duration: '1m',  target: 50 },   // Ramp to 50 VUs
    { duration: '2m',  target: 50 },   // Sustain 50 VUs
    { duration: '30s', target: 100 },  // Spike to 100 VUs
    { duration: '1m',  target: 50 },   // Back to 50
    { duration: '30s', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],   // 95% of requests < 500ms
    http_req_duration: ['p(99)<1000'],  // 99% < 1s
    errors: ['rate<0.01'],              // Error rate < 1%
    task_list_duration: ['p(95)<200'],  // Task list specifically < 200ms
  },
}

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000'

export function setup() {
  // Get auth token once
  const loginRes = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
    email: '[email protected]',
    password: 'loadtest123',
  }), { headers: { 'Content-Type': 'application/json' } })
  
  return { token: loginRes.json('token') }
}

export default function(data) {
  const headers = {
    'Authorization': `Bearer ${data.token}`,
    'Content-Type': 'application/json',
  }
  
  // Scenario 1: List tasks
  const start = Date.now()
  const listRes = http.get(`${BASE_URL}/api/tasks?limit=20`, { headers })
  taskListDuration.add(Date.now() - start)
  
  check(listRes, {
    'list tasks: status 200': (r) => r.status === 200,
    'list tasks: has items': (r) => r.json('items') !== undefined,
  }) || errorRate.add(1)
  
  sleep(0.5)
  
  // Scenario 2: Create task
  const createRes = http.post(
    `${BASE_URL}/api/tasks`,
    JSON.stringify({ title: `Load test task ${Date.now()}`, priority: 'medium' }),
    { headers }
  )
  
  check(createRes, {
    'create task: status 201': (r) => r.status === 201,
  }) || errorRate.add(1)
  
  sleep(1)
}

export function teardown(data) {
  // Cleanup: delete load test tasks
}
```

```bash
# Run load test
k6 run tests/load/api-load-test.js \
  --env BASE_URL=https://staging.myapp.com

# With Grafana output
k6 run --out influxdb=http://localhost:8086/k6 tests/load/api-load-test.js
```

---

```