SkillHub ClubResearch & OpsFull Stack

openclaw-ops

Use when diagnosing, repairing, or maintaining an OpenClaw Gateway on the same machine. Designed for rescue agents to fix a down gateway or check operational health. Supports Linux (systemd) and macOS (launchd).

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,087

Hot score

Updated

March 20, 2026

Overall rating

C4.0

Composite score

4.0

Best-practice grade

B81.2

Install command

npx @skill-hub/cli install openclaw-skills-openclaw-ops

Repository

openclaw/skills

Skill path: skills/dinstein/openclaw-ops

Open repository

Best for

Primary workflow: Research & Ops.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install openclaw-ops into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding openclaw-ops to shared team environments
Use openclaw-ops for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: openclaw-ops
version: 1.2.1
description: Use when diagnosing, repairing, or maintaining an OpenClaw Gateway on the same machine. Designed for rescue agents to fix a down gateway or check operational health. Supports Linux (systemd) and macOS (launchd).
repository: https://github.com/dinstein/openclaw-ops-skill
requirements:
  - Shell access on the same machine as the OpenClaw Gateway
  - Read/write access to ~/.openclaw/ (config, agents, sessions)
  - Read access to systemd user service config (Linux) or LaunchAgents (macOS)
  - Node.js 18+ and npm (for openclaw CLI)
  - Optional: Tailscale CLI (for reverse proxy troubleshooting)
security_notes:
  - This skill instructs the agent to read and modify OpenClaw config files
  - The agent will access env files containing tokens (but is instructed never to print them)
  - Config edits are always preceded by timestamped backups
  - Destructive operations require user confirmation
---

# OpenClaw Operations

## Design Philosophy

This skill serves **two core scenarios** and nothing else:

1. **Rescue** — The main OpenClaw Gateway is down or broken. You (the rescue agent) need to diagnose the root cause, fix it, and bring the gateway back online.
2. **Health Check** — The main OpenClaw Gateway is running. You need to verify its operational health, clean up resources, or perform maintenance tasks like upgrades.

**What this skill is NOT for:**
- Day-to-day business configuration (adding channels, configuring agents, setting up integrations)
- Application-level issues (agent behavior, prompt tuning, skill management)
- Initial deployment or first-time setup (use `openclaw daemon install` and `openclaw configure`)

**Principle:** Diagnose → Judge → Act → Verify. Never skip steps.

## Platform Detection

Detect the platform first — commands differ between Linux (systemd) and macOS (launchd):

```bash
OS=$(uname -s)  # "Linux" or "Darwin"
echo "Platform: $OS"
```

## Port Detection

Do NOT assume port 18789. Detect the actual configured port first:

```bash
PORT=$(openclaw config get gateway.port 2>/dev/null | grep -oE '[0-9]+')
PORT=${PORT:-18789}  # fallback to default only if config unavailable
echo "Gateway port: $PORT"
```

Use `$PORT` in all port-related commands throughout this guide.

---

# Scenario A: Rescue (Gateway Down)

Follow these sections in order when the main gateway is not running.

## A1. Assess the Situation

```bash
# Is the service running at all?
# Linux:
systemctl --user status openclaw-gateway
# macOS:
launchctl list | grep openclaw

# Is the process alive?
pgrep -af openclaw

# Is the port listening?
# Linux:
ss -tlnp | grep $PORT
# macOS:
lsof -iTCP:$PORT -sTCP:LISTEN
```

## A2. Check Logs for Root Cause

**Linux:**
```bash
journalctl --user -u openclaw-gateway --since "1 hour ago" --no-pager | grep -iE "error|crash|fatal|SIGTERM|OOM"

# Last 50 lines for context
journalctl --user -u openclaw-gateway -n 50 --no-pager
```

**macOS:**
```bash
LOG_DIR="$HOME/.openclaw/logs"
grep -iE "error|crash|fatal" "$LOG_DIR/gateway.log" | tail -20
tail -50 "$LOG_DIR/gateway.log"

# Also check unified log
log show --predicate 'process == "node"' --last 1h | grep -iE "error|crash|fatal"
```

### Common crash patterns

| Log pattern | Meaning | Fix |
|------------|---------|-----|
| `EADDRINUSE` | Port already in use | Find conflicting process: `ss -tlnp \| grep $PORT` (Linux) or `lsof -iTCP:$PORT` (macOS), kill it or change port |
| `ENOMEM` / `JavaScript heap` | Out of memory | Check `free -h` (Linux) / `vm_stat` (macOS), kill memory hogs or increase Node heap |
| `SyntaxError` in config | Bad JSON in openclaw.json | See A3 Config Repair |
| `MODULE_NOT_FOUND` | Missing dependency | `cd $(npm root -g)/openclaw && npm install --production` |
| `Invalid token` / `401` / `403` | Auth failure | Check tokens in env file or systemd drop-in |
| `ECONNREFUSED` | Upstream unreachable | Check network, Tailscale, API endpoints |

## A3. Config Repair

**Always backup first:**
```bash
cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak.$(date +%s)
```

**JSON syntax validation:**
```bash
python3 -c "import json; json.load(open('$HOME/.openclaw/openclaw.json'))"
```

Common JSON issues: trailing comma, missing quotes, unescaped characters. The error message shows line/position.

**Config schema validation:**
```bash
openclaw config get gateway  # check gateway config section
openclaw config get channels  # check channels config section
```

**Common config errors:**

| Symptom | Likely cause | Fix |
|---------|-------------|-----|
| "device identity mismatch" | Service env token ≠ config token | Sync tokens between env file and openclaw.json |
| Agent not routing | `bindings` misconfigured | Bindings go at top-level, not inside `agents.list[].routing` |

**After fixing, validate:**
```bash
python3 -c "import json; json.load(open('$HOME/.openclaw/openclaw.json')); print('JSON OK')"
openclaw status
```

## A4. Check Resources

```bash
# Disk space
df -h ~

# Memory (Linux)
free -h
# Memory (macOS)
vm_stat | head -5

# Node.js available?
node -v
which openclaw
openclaw --version
```

## A5. Restart and Verify

Only restart **after identifying and fixing the root cause**.

**Linux:**
```bash
systemctl --user restart openclaw-gateway
sleep 3
systemctl --user status openclaw-gateway
journalctl --user -u openclaw-gateway -n 20 --no-pager
```

**macOS:**
```bash
launchctl kickstart -k "gui/$(id -u)/com.openclaw.gateway"
sleep 3
launchctl list | grep openclaw
tail -20 ~/.openclaw/logs/gateway.log
```

**If service won't start at all:**
```bash
# Manual foreground start for better error output
openclaw gateway start
```

**Final verification:**
```bash
openclaw status
openclaw doctor --non-interactive
```

---

# Scenario B: Health Check (Gateway Running)

Follow these sections for routine operational checks on a running gateway.

## B1. Quick Health Check

```bash
# Comprehensive check — start here
openclaw doctor

# If issues found, auto-fix safe ones
openclaw doctor --fix
```

## B2. Update & Upgrade

```bash
# Check versions
CURRENT=$(openclaw --version)
LATEST=$(npm view openclaw version)
echo "Current: $CURRENT  Latest: $LATEST"
```

**Perform update:**
```bash
# 1. Save doctor baseline
openclaw doctor --non-interactive 2>&1 | tee /tmp/doctor-before.txt

# 2. Backup config
cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-upgrade.$(date +%s)

# 3. Update
npm update -g openclaw
openclaw --version

# 4. Restart
# Linux:
systemctl --user restart openclaw-gateway
# macOS:
launchctl kickstart -k "gui/$(id -u)/com.openclaw.gateway"

# 5. Compare doctor output
sleep 5
openclaw doctor --non-interactive 2>&1 | tee /tmp/doctor-after.txt
diff /tmp/doctor-before.txt /tmp/doctor-after.txt
```

**Rollback:**
```bash
npm install -g openclaw@<previous_version>
cp ~/.openclaw/openclaw.json.pre-upgrade.<timestamp> ~/.openclaw/openclaw.json
# Restart (platform-appropriate command above)
```

## B3. Session & Disk Cleanup

```bash
# Check disk usage per agent
for agent_dir in ~/.openclaw/agents/*/; do
    agent=$(basename "$agent_dir")
    size=$(du -sh "$agent_dir/sessions/" 2>/dev/null | cut -f1)
    count=$(find "$agent_dir/sessions/" -name "*.jsonl" 2>/dev/null | wc -l)
    echo "$agent: $size ($count transcripts)"
done

# Auto-fix orphans
openclaw doctor --fix

# Manual cleanup: old transcripts (>30 days)
find ~/.openclaw/agents/*/sessions/ -name "*.jsonl" -mtime +30 -exec ls -lh {} \;
# Review, then delete if safe:
find ~/.openclaw/agents/*/sessions/ -name "*.jsonl" -mtime +30 -delete
```

## B4. Backup

```bash
BACKUP_DIR=~/openclaw-backup-$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR"

# Core files
cp ~/.openclaw/openclaw.json "$BACKUP_DIR/"
[ -f ~/.openclaw/env ] && cp ~/.openclaw/env "$BACKUP_DIR/" || echo "No env file (tokens may be in systemd drop-in or plist)"
cp -r ~/.openclaw/agents "$BACKUP_DIR/"
cp -r ~/.openclaw/devices "$BACKUP_DIR/"
cp -r ~/.openclaw/workspace "$BACKUP_DIR/"

# Service config
if [ "$(uname -s)" = "Linux" ]; then
    cp ~/.config/systemd/user/openclaw-gateway.service "$BACKUP_DIR/" 2>/dev/null
    cp -r ~/.config/systemd/user/openclaw-gateway.service.d "$BACKUP_DIR/" 2>/dev/null
elif [ "$(uname -s)" = "Darwin" ]; then
    cp ~/Library/LaunchAgents/com.openclaw.gateway.plist "$BACKUP_DIR/" 2>/dev/null
fi

echo "Backup saved to $BACKUP_DIR"
```

## B5. Tailscale Serve Check

If OpenClaw uses Tailscale Serve as reverse proxy:

```bash
tailscale status
tailscale serve status
curl -s -o /dev/null -w "%{http_code}" http://localhost:$PORT/healthz || echo "Gateway not reachable on localhost"
```

**Reconfigure if broken:**
```bash
tailscale serve reset
tailscale serve https / http://localhost:$PORT
tailscale serve status
```

---

# Reference

## Troubleshooting Quick Index

| Symptom | Path |
|---------|------|
| Gateway won't start | A1 → A2 → A3 → A5 |
| Gateway crashed | A2 (logs) → A4 (resources) → A3 (config) → A5 (restart) |
| Config broken after edit | A3 → A5 |
| Disk filling up | B3 |
| After upgrade something broke | B2 (rollback) |
| Tailscale not proxying | B5 |

## `openclaw doctor` Reference

| Flag | Effect |
|------|--------|
| (none) | Interactive health check |
| `--fix` | Apply safe repairs (orphan cleanup, stale locks) |
| `--force` | Aggressive repairs (may overwrite custom service config) |
| `--deep` | Scan system for extra gateway installs |
| `--non-interactive` | No prompts, safe migrations only |

**`--fix` repairs:** orphan transcripts, stale session locks, legacy key migration.
**`--fix` does NOT:** modify openclaw.json, change service files (unless `--force`), delete workspace files.

## Key Commands

| Command | Purpose |
|---------|---------|
| `openclaw status` | Quick status: running, version, basic info |
| `openclaw doctor` | Deep health check: state, channels, plugins, skills |
| `openclaw doctor --fix` | Health check + auto-repair safe issues |
| `openclaw gateway start` | Start gateway in foreground (for debugging) |
| `openclaw daemon install` | Install as persistent service (systemd/launchd) |
| `openclaw daemon restart` | Restart the service |
| `openclaw config get <path>` | Read config value |
| `openclaw config set <path> <value>` | Write config value |

## Safety Rules

1. **Always check logs before changing anything** — understand the problem first
2. **Always backup before editing config** — `cp` with timestamp suffix
3. **Always validate JSON after editing** — one bad comma kills the service
4. **Never print secrets** — check env file exists, don't cat it
5. **Never delete workspace files** — use `trash` if you must remove something
6. **Always verify after restart** — status + logs, don't assume it worked
7. **Destructive operations require confirmation** — ask the user before wiping data

## File Layout

```
~/.openclaw/
├── openclaw.json              # Main config
├── openclaw.json.bak          # Auto-backup
├── env                        # Environment variables (secrets)
├── logs/                      # macOS: launchd log output
├── agents/                    # Per-agent configs
│   └── <agent>/agent/
│       ├── auth-profiles.json
│       └── models.json
├── devices/
│   └── paired.json
├── workspace/                 # Agent workspace
└── sessions/                  # Session logs

# Linux:
~/.config/systemd/user/
├── openclaw-gateway.service
└── openclaw-gateway.service.d/
    └── env.conf

# macOS:
~/Library/LaunchAgents/
└── com.openclaw.gateway.plist
```


---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
# openclaw-ops

[English](README.md) | [中文](README_CN.md)

A skill that teaches any AI agent how to operate and maintain an [OpenClaw](https://openclaw.ai) Gateway running as a persistent service.

Works with **any agent that has shell access** — Claude Code, Codex, OpenClaw, Pi, or any AI agent with shell access running on the same machine as the OpenClaw Gateway.

Supports both **Linux (systemd)** and **macOS (launchd)**.

## Install

> **⚠️ Install this skill on your rescue agent (Claude Code, secondary OpenClaw, etc.), NOT on the main OpenClaw Gateway you want to maintain.**

### Ask your agent to install it

Just tell your rescue agent in a conversation:

```
> Install the openclaw-ops skill from https://github.com/dinstein/openclaw-ops-skill
```

The agent will download `SKILL.md` and place it in the correct skills directory automatically.

### From ClawHub (when using a secondary OpenClaw as rescue agent)

```bash
clawhub install openclaw-ops
```

### Manual

Copy `SKILL.md` into your agent's skills directory:

```bash
# Claude Code
mkdir -p ~/.claude/skills/openclaw-ops
cp SKILL.md ~/.claude/skills/openclaw-ops/SKILL.md

# OpenClaw
mkdir -p ~/.openclaw/workspace/skills/openclaw-ops
cp SKILL.md ~/.openclaw/workspace/skills/openclaw-ops/SKILL.md

# Other agents — place in whatever skills directory your agent reads from
```

## Architecture

```mermaid
graph LR
    You["👤 You"] -->|remote access| Rescue["🛠️ Rescue Agent<br/>+ openclaw-ops skill"]
    Rescue -->|diagnoses & fixes| Main["🦞 Main OpenClaw<br/>Gateway"]
    Main -->|serves users| Users["👥 Discord / Telegram / etc"]
```

**Main OpenClaw Gateway** — Your primary AI agent system. Handles all day-to-day operations: chat channels, cron jobs, sessions, etc.

**Rescue Agent** — A separate agent (Claude Code, secondary OpenClaw instance, or any AI agent with shell access) with this skill installed. Lives on the same machine. Its sole purpose: fix the main gateway when it breaks, and perform operational health checks.

**This skill** — Teaches the rescue agent what commands to run, how to interpret output, and what steps to follow for diagnosis and repair.

## Two Core Scenarios

### 🔴 Rescue: Main Gateway is Down

The main OpenClaw is crashed, misconfigured, or won't start. You connect to the rescue agent and ask it to fix things.

**Example 1: Won't start after config edit**
```
You: "I edited openclaw.json and now it won't start"

Rescue agent: validates JSON → finds trailing comma on line 47 →
backs up file → fixes syntax → restarts gateway → verifies health
```

**Example 2: Broken after upgrade**
```
You: "I upgraded OpenClaw and now the gateway keeps crashing"

Rescue agent: checks crash logs → finds MODULE_NOT_FOUND for a removed plugin →
reinstalls dependencies → restarts → still failing →
rolls back to previous version → gateway is back online
```

### 🟢 Health Check: Main Gateway is Running

The main OpenClaw is working fine, but you want to verify health, update, or clean up.

```
You: "Run a health check on OpenClaw"

Rescue agent: runs openclaw doctor → reports status, orphan count,
channel connectivity, disk usage → offers to fix any issues found
```

## Connecting to the Rescue Agent

When the main OpenClaw is down, you can't talk to it through Discord/Telegram. You need an alternative way to reach the rescue agent on the server.

### Option 1: Native Remote Control (⭐ Recommended)

If your agent supports native remote access, use it — the most seamless experience with full agent capabilities from your browser or mobile device.

- **Claude Code** — [Remote Control](https://code.claude.com/docs/en/remote-control): access your server-side Claude Code from any browser, with built-in auth and session persistence

Currently only Claude Code offers this. As more agents add native remote access, this will be the preferred approach for all.

### Option 2: Remote Agent Platforms

Third-party platforms that let you manage and interact with coding agents on remote machines:

- [Happy](https://github.com/slopus/happy) — Mobile & web client for Claude Code and Codex with push notifications, device switching, and E2E encryption
- [Hapi](https://github.com/tiann/hapi) — Local-first remote control for Claude Code / Codex / Gemini / OpenCode via Web, PWA, or Telegram Mini App
- Similar products that support remote shell-capable agent access

These are good when your agent doesn't have native remote control, or when you want a unified dashboard for multiple agents.

### Option 3: SSH + Agent CLI

SSH into the server and run the agent directly in the terminal:

```bash
ssh user@your-server
claude  # or codex, or any agent CLI
```

**Tips for mobile:**
- Use SSH client apps (Termius, Blink, etc.) from your phone
- Use tmux to keep sessions alive between connections:

```bash
# On the server (one-time setup)
tmux new -s rescue
claude

# Later, from anywhere
ssh user@your-server
tmux attach -t rescue
```

- VS Code Remote SSH also works well from a laptop

## What it covers

| Module | Scenario | Description |
|--------|----------|-------------|
| Crash diagnosis | 🔴 Rescue | Read logs, identify root cause |
| Config repair | 🔴 Rescue | JSON fix, schema validation, common errors |
| Service restart | 🔴 Rescue | Safe restart after fixing root cause |
| Resource check | 🔴 Rescue | Disk, memory, Node.js, dependencies |
| Health check | 🟢 Health | `openclaw doctor`, service status |
| Update & upgrade | 🟢 Health | Version check, safe upgrade, rollback |
| Disk cleanup | 🟢 Health | Orphan transcripts, session management |
| Backup | 🟢 Health | Config, agents, workspace backup |
| Tailscale check | 🟢 Health | Reverse proxy verification |

## Requirements

The rescue agent needs:

- **Shell access** on the same machine as the OpenClaw Gateway
- **Read/write access** to `~/.openclaw/` (config, agents, sessions)
- **Node.js 18+** and npm (for the `openclaw` CLI)
- **Optional:** Tailscale CLI (for reverse proxy troubleshooting)

### Security considerations

This skill instructs the agent to:
- Read and modify OpenClaw config files (always with backup first)
- Access env files containing tokens (but **never print them**)
- Restart system services (systemd / launchd)
- Delete old session transcripts (only with user confirmation)

All destructive operations require explicit user approval.

## How it works

This is a **pure documentation skill** — no scripts, no external dependencies, no framework lock-in. Install it in any agent that can read markdown and run shell commands. It teaches the agent:

1. **What to check** — the right commands for each situation
2. **How to interpret** — what the output means and common error patterns
3. **What to do** — step-by-step fix procedures with safety guardrails
4. **How to verify** — confirmation steps after every action

## Safety

The skill enforces these rules:

- Always check logs before making changes
- Always backup config before editing
- Always validate JSON after editing
- Never print secrets (env files)
- Never delete workspace files without confirmation
- Always verify after restart

## License

MIT

```

### _meta.json

```json
{
  "owner": "dinstein",
  "slug": "openclaw-ops",
  "displayName": "OpenClaw Ops",
  "latest": {
    "version": "1.2.1",
    "publishedAt": 1772011554077,
    "commit": "https://github.com/openclaw/skills/commit/2c8206053c09e688222b160550d1ffed9aee323d"
  },
  "history": []
}

```