Back to skills
SkillHub ClubShip Full StackFull Stack
browser-use-local
Imported from https://github.com/openclaw/skills.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Stars
3,087
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
F32.4
Install command
npx @skill-hub/cli install openclaw-skills-browser-use-local
Repository
openclaw/skills
Skill path: skills/fengjiajie/browser-use-local
Imported from https://github.com/openclaw/skills.
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install browser-use-local into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding browser-use-local to shared team environments
- Use browser-use-local for development workflows
Works across
Claude CodeCodex CLIGemini CLIOpenCode
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: browser-use-local
description: Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
---
# browser-use (local) playbook
## Default constraints in this environment
- Prefer **browser-use** (CLI/Python) over OpenClaw `browser` tool here; OpenClaw `browser` may fail if no supported system browser is present.
- Use **persistent sessions** to do multi-step flows: `--session <name>`.
## Quick CLI workflow (non-agent)
1) Open
```bash
browser-use --session demo open https://example.com
```
2) Inspect (sometimes `state` returns 0 elements on heavy/JS sites)
```bash
browser-use --session demo --json state | jq '.data | {url,title,elements:(.elements|length)}'
```
3) Screenshot (always works; best debugging primitive)
```bash
browser-use --session demo screenshot /home/node/.openclaw/workspace/page.png
```
4) HTML for link discovery (works even when `state` is empty)
```bash
browser-use --session demo --json get html > /tmp/page_html.json
python3 - <<'PY'
import json,re
html=json.load(open('/tmp/page_html.json')).get('data',{}).get('html','')
urls=set(re.findall(r"https?://[^\s\"'<>]+", html))
for u in sorted([u for u in urls if any(k in u for k in ['demo','login','console','qr','qrcode'])])[:200]:
print(u)
PY
```
5) Lightweight DOM queries via JS (useful when `state` is empty)
```bash
browser-use --session demo --json eval "location.href"
browser-use --session demo --json eval "document.title"
```
## Agent workflow with OpenAI-compatible LLM (Moonshot/Kimi)
Use Python for Agent runs when the CLI `run` path requires Browser-Use cloud keys or when you need strict control over LLM parameters.
### Minimal working Kimi example
Create `.env` (or export env vars) with:
- `OPENAI_API_KEY=...`
- `OPENAI_BASE_URL=https://api.moonshot.cn/v1`
Then run the bundled script:
```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
python /home/node/.openclaw/workspace/skills/browser-use-local/scripts/run_agent_kimi.py
```
**Kimi/Moonshot quirks observed in practice** (fixes):
- `temperature` must be `1` for `kimi-k2.5`.
- `frequency_penalty` must be `0` for `kimi-k2.5`.
- Moonshot can reject strict JSON Schema used for structured output. Enable:
- `remove_defaults_from_schema=True`
- `remove_min_items_from_schema=True`
If you get a 400 error mentioning `response_format.json_schema ... keyword 'default' is not allowed` or `min_items unsupported`, those two flags are the first thing to set.
## QR code extraction (login/demo pages)
### Preferred order
1) **Screenshot the page** and crop candidate regions (fast, robust).
2) If HTML contains `data:image/png;base64,...`, extract and decode it.
### Crop candidates
Use `scripts/crop_candidates.py` to generate multiple likely QR crops from a screenshot.
```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
python skills/browser-use-local/scripts/crop_candidates.py \
--in /home/node/.openclaw/workspace/login.png \
--outdir /home/node/.openclaw/workspace/qr_crops
```
### Extract base64-embedded images from HTML
```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
browser-use --session demo --json get html > /tmp/page_html.json
python skills/browser-use-local/scripts/extract_data_images.py \
--in /tmp/page_html.json \
--outdir /home/node/.openclaw/workspace/data_imgs
```
## Troubleshooting
- **`state` shows `elements: 0`**: use `get html` + regex discovery, plus screenshots; use `eval` to query DOM.
- **Page readiness timeout warnings**: usually harmless; rely on screenshot + HTML.
- **CLI flags order**: global flags go *before* the subcommand:
- ✅ `browser-use --browser chromium --json open https://...`
- ❌ `browser-use open https://... --browser chromium`
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/crop_candidates.py
```python
"""Generate a few likely QR-code crops from a full-page screenshot.
This is a heuristic helper: many login pages place QR codes on the right side.
"""
import argparse
from pathlib import Path
from PIL import Image
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--in", dest="inp", required=True, help="Input screenshot (png/jpg)")
ap.add_argument("--outdir", required=True, help="Output directory")
args = ap.parse_args()
inp = Path(args.inp)
outdir = Path(args.outdir)
outdir.mkdir(parents=True, exist_ok=True)
img = Image.open(inp)
w, h = img.size
crops = {
"right_half": (w // 2, 0, w, h),
"right_center": (int(w * 0.55), int(h * 0.15), int(w * 0.95), int(h * 0.85)),
"center": (int(w * 0.25), int(h * 0.15), int(w * 0.75), int(h * 0.85)),
"top_center": (int(w * 0.25), 0, int(w * 0.75), int(h * 0.5)),
"bottom_center": (int(w * 0.25), int(h * 0.5), int(w * 0.75), h),
}
for name, box in crops.items():
out = outdir / f"{inp.stem}_crop_{name}.png"
img.crop(box).save(out)
print(str(out))
if __name__ == "__main__":
main()
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "fengjiajie",
"slug": "browser-use-local",
"displayName": "Browser Use Local",
"latest": {
"version": "1.0.0",
"publishedAt": 1770276408699,
"commit": "https://github.com/openclaw/skills/commit/d65768aa3203dde696ab7e4db16498aee2c48783"
},
"history": []
}
```
### scripts/extract_data_images.py
```python
"""Extract data:image/*;base64,... images from browser-use get html JSON output.
Input is the JSON produced by:
browser-use --json get html > /tmp/page_html.json
Writes extracted images into --outdir.
"""
import argparse
import base64
import json
import re
from pathlib import Path
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--in", dest="inp", required=True, help="Input JSON file")
ap.add_argument("--outdir", required=True, help="Output directory")
args = ap.parse_args()
outdir = Path(args.outdir)
outdir.mkdir(parents=True, exist_ok=True)
obj = json.load(open(args.inp, "r", encoding="utf-8"))
html = obj.get("data", {}).get("html", "")
imgs = re.findall(r"data:image/(png|jpeg);base64,([A-Za-z0-9+/=]+)", html)
if not imgs:
print("no data:image found")
return
# sort biggest first
imgs = sorted(imgs, key=lambda x: len(x[1]), reverse=True)
for i, (ext, b64) in enumerate(imgs):
out = outdir / f"dataimg_{i}.{ext}"
out.write_bytes(base64.b64decode(b64))
print(str(out))
if __name__ == "__main__":
main()
```
### scripts/run_agent_kimi.py
```python
import asyncio
import os
from dotenv import load_dotenv
from browser_use import Agent, ChatOpenAI
async def main() -> None:
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
base_url = os.getenv("OPENAI_BASE_URL")
if not api_key:
raise SystemExit("OPENAI_API_KEY is not set")
if not base_url:
raise SystemExit("OPENAI_BASE_URL is not set")
llm = ChatOpenAI(
model=os.getenv("OPENAI_MODEL", "kimi-k2.5"),
api_key=api_key,
base_url=base_url,
# Moonshot/Kimi observed constraints:
temperature=float(os.getenv("OPENAI_TEMPERATURE", "1")),
frequency_penalty=float(os.getenv("OPENAI_FREQUENCY_PENALTY", "0")),
# Make JSON Schema compatible with stricter gateways:
remove_defaults_from_schema=True,
remove_min_items_from_schema=True,
)
agent = Agent(
task=os.getenv(
"BROWSER_USE_TASK",
"Open https://example.com and return the page title.",
),
llm=llm,
)
history = await agent.run(max_steps=int(os.getenv("BROWSER_USE_MAX_STEPS", "15")))
print(history.final_result())
if __name__ == "__main__":
asyncio.run(main())
```