Back to skills
SkillHub ClubShip Full StackFull Stack

browser-use-local

Imported from https://github.com/openclaw/skills.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,087
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
F32.4

Install command

npx @skill-hub/cli install openclaw-skills-browser-use-local

Repository

openclaw/skills

Skill path: skills/fengjiajie/browser-use-local

Imported from https://github.com/openclaw/skills.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install browser-use-local into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding browser-use-local to shared team environments
  • Use browser-use-local for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: browser-use-local
description: Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
---

# browser-use (local) playbook

## Default constraints in this environment

- Prefer **browser-use** (CLI/Python) over OpenClaw `browser` tool here; OpenClaw `browser` may fail if no supported system browser is present.
- Use **persistent sessions** to do multi-step flows: `--session <name>`.

## Quick CLI workflow (non-agent)

1) Open

```bash
browser-use --session demo open https://example.com
```

2) Inspect (sometimes `state` returns 0 elements on heavy/JS sites)

```bash
browser-use --session demo --json state | jq '.data | {url,title,elements:(.elements|length)}'
```

3) Screenshot (always works; best debugging primitive)

```bash
browser-use --session demo screenshot /home/node/.openclaw/workspace/page.png
```

4) HTML for link discovery (works even when `state` is empty)

```bash
browser-use --session demo --json get html > /tmp/page_html.json
python3 - <<'PY'
import json,re
html=json.load(open('/tmp/page_html.json')).get('data',{}).get('html','')
urls=set(re.findall(r"https?://[^\s\"'<>]+", html))
for u in sorted([u for u in urls if any(k in u for k in ['demo','login','console','qr','qrcode'])])[:200]:
    print(u)
PY
```

5) Lightweight DOM queries via JS (useful when `state` is empty)

```bash
browser-use --session demo --json eval "location.href"
browser-use --session demo --json eval "document.title"
```

## Agent workflow with OpenAI-compatible LLM (Moonshot/Kimi)

Use Python for Agent runs when the CLI `run` path requires Browser-Use cloud keys or when you need strict control over LLM parameters.

### Minimal working Kimi example

Create `.env` (or export env vars) with:

- `OPENAI_API_KEY=...`
- `OPENAI_BASE_URL=https://api.moonshot.cn/v1`

Then run the bundled script:

```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
python /home/node/.openclaw/workspace/skills/browser-use-local/scripts/run_agent_kimi.py
```

**Kimi/Moonshot quirks observed in practice** (fixes):

- `temperature` must be `1` for `kimi-k2.5`.
- `frequency_penalty` must be `0` for `kimi-k2.5`.
- Moonshot can reject strict JSON Schema used for structured output. Enable:
  - `remove_defaults_from_schema=True`
  - `remove_min_items_from_schema=True`

If you get a 400 error mentioning `response_format.json_schema ... keyword 'default' is not allowed` or `min_items unsupported`, those two flags are the first thing to set.

## QR code extraction (login/demo pages)

### Preferred order

1) **Screenshot the page** and crop candidate regions (fast, robust).
2) If HTML contains `data:image/png;base64,...`, extract and decode it.

### Crop candidates

Use `scripts/crop_candidates.py` to generate multiple likely QR crops from a screenshot.

```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
python skills/browser-use-local/scripts/crop_candidates.py \
  --in /home/node/.openclaw/workspace/login.png \
  --outdir /home/node/.openclaw/workspace/qr_crops
```

### Extract base64-embedded images from HTML

```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
browser-use --session demo --json get html > /tmp/page_html.json
python skills/browser-use-local/scripts/extract_data_images.py \
  --in /tmp/page_html.json \
  --outdir /home/node/.openclaw/workspace/data_imgs
```

## Troubleshooting

- **`state` shows `elements: 0`**: use `get html` + regex discovery, plus screenshots; use `eval` to query DOM.
- **Page readiness timeout warnings**: usually harmless; rely on screenshot + HTML.
- **CLI flags order**: global flags go *before* the subcommand:
  - ✅ `browser-use --browser chromium --json open https://...`
  - ❌ `browser-use open https://... --browser chromium`



---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/crop_candidates.py

```python
"""Generate a few likely QR-code crops from a full-page screenshot.

This is a heuristic helper: many login pages place QR codes on the right side.
"""

import argparse
from pathlib import Path

from PIL import Image


def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--in", dest="inp", required=True, help="Input screenshot (png/jpg)")
    ap.add_argument("--outdir", required=True, help="Output directory")
    args = ap.parse_args()

    inp = Path(args.inp)
    outdir = Path(args.outdir)
    outdir.mkdir(parents=True, exist_ok=True)

    img = Image.open(inp)
    w, h = img.size

    crops = {
        "right_half": (w // 2, 0, w, h),
        "right_center": (int(w * 0.55), int(h * 0.15), int(w * 0.95), int(h * 0.85)),
        "center": (int(w * 0.25), int(h * 0.15), int(w * 0.75), int(h * 0.85)),
        "top_center": (int(w * 0.25), 0, int(w * 0.75), int(h * 0.5)),
        "bottom_center": (int(w * 0.25), int(h * 0.5), int(w * 0.75), h),
    }

    for name, box in crops.items():
        out = outdir / f"{inp.stem}_crop_{name}.png"
        img.crop(box).save(out)
        print(str(out))


if __name__ == "__main__":
    main()

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "fengjiajie",
  "slug": "browser-use-local",
  "displayName": "Browser Use Local",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1770276408699,
    "commit": "https://github.com/openclaw/skills/commit/d65768aa3203dde696ab7e4db16498aee2c48783"
  },
  "history": []
}

```

### scripts/extract_data_images.py

```python
"""Extract data:image/*;base64,... images from browser-use get html JSON output.

Input is the JSON produced by:
  browser-use --json get html > /tmp/page_html.json

Writes extracted images into --outdir.
"""

import argparse
import base64
import json
import re
from pathlib import Path


def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--in", dest="inp", required=True, help="Input JSON file")
    ap.add_argument("--outdir", required=True, help="Output directory")
    args = ap.parse_args()

    outdir = Path(args.outdir)
    outdir.mkdir(parents=True, exist_ok=True)

    obj = json.load(open(args.inp, "r", encoding="utf-8"))
    html = obj.get("data", {}).get("html", "")

    imgs = re.findall(r"data:image/(png|jpeg);base64,([A-Za-z0-9+/=]+)", html)
    if not imgs:
        print("no data:image found")
        return

    # sort biggest first
    imgs = sorted(imgs, key=lambda x: len(x[1]), reverse=True)

    for i, (ext, b64) in enumerate(imgs):
        out = outdir / f"dataimg_{i}.{ext}"
        out.write_bytes(base64.b64decode(b64))
        print(str(out))


if __name__ == "__main__":
    main()

```

### scripts/run_agent_kimi.py

```python
import asyncio
import os

from dotenv import load_dotenv
from browser_use import Agent, ChatOpenAI


async def main() -> None:
    load_dotenv()

    api_key = os.getenv("OPENAI_API_KEY")
    base_url = os.getenv("OPENAI_BASE_URL")

    if not api_key:
        raise SystemExit("OPENAI_API_KEY is not set")
    if not base_url:
        raise SystemExit("OPENAI_BASE_URL is not set")

    llm = ChatOpenAI(
        model=os.getenv("OPENAI_MODEL", "kimi-k2.5"),
        api_key=api_key,
        base_url=base_url,
        # Moonshot/Kimi observed constraints:
        temperature=float(os.getenv("OPENAI_TEMPERATURE", "1")),
        frequency_penalty=float(os.getenv("OPENAI_FREQUENCY_PENALTY", "0")),
        # Make JSON Schema compatible with stricter gateways:
        remove_defaults_from_schema=True,
        remove_min_items_from_schema=True,
    )

    agent = Agent(
        task=os.getenv(
            "BROWSER_USE_TASK",
            "Open https://example.com and return the page title.",
        ),
        llm=llm,
    )

    history = await agent.run(max_steps=int(os.getenv("BROWSER_USE_MAX_STEPS", "15")))
    print(history.final_result())


if __name__ == "__main__":
    asyncio.run(main())

```

browser-use-local | SkillHub