baoyu-url-to-markdown
Fetch any URL and convert to markdown using Chrome CDP. Supports two modes - auto-capture on page load, or wait for user signal (for pages requiring login). Use when user wants to save a webpage as markdown.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install jimliu-baoyu-skills-baoyu-url-to-markdown
Repository
Skill path: skills/baoyu-url-to-markdown
Fetch any URL and convert to markdown using Chrome CDP. Supports two modes - auto-capture on page load, or wait for user signal (for pages requiring login). Use when user wants to save a webpage as markdown.
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: JimLiu.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install baoyu-url-to-markdown into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/JimLiu/baoyu-skills before adding baoyu-url-to-markdown to shared team environments
- Use baoyu-url-to-markdown for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: baoyu-url-to-markdown
description: Fetch any URL and convert to markdown using Chrome CDP. Supports two modes - auto-capture on page load, or wait for user signal (for pages requiring login). Use when user wants to save a webpage as markdown.
---
# URL to Markdown
Fetches any URL via Chrome CDP and converts HTML to clean markdown.
## Script Directory
**Important**: All scripts are located in the `scripts/` subdirectory of this skill.
**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as `SKILL_DIR`
2. Script path = `${SKILL_DIR}/scripts/<script-name>.ts`
3. Replace all `${SKILL_DIR}` in this document with the actual path
**Script Reference**:
| Script | Purpose |
|--------|---------|
| `scripts/main.ts` | CLI entry point for URL fetching |
## Preferences (EXTEND.md)
Use Bash to check EXTEND.md existence (priority order):
```bash
# Check project-level first
test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
# Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"
```
┌────────────────────────────────────────────────────────┬───────────────────┐
│ Path │ Location │
├────────────────────────────────────────────────────────┼───────────────────┤
│ .baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ Project directory │
├────────────────────────────────────────────────────────┼───────────────────┤
│ $HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ User home │
└────────────────────────────────────────────────────────┴───────────────────┘
┌───────────┬───────────────────────────────────────────────────────────────────────────┐
│ Result │ Action │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Found │ Read, parse, apply settings │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Not found │ Use defaults │
└───────────┴───────────────────────────────────────────────────────────────────────────┘
**EXTEND.md Supports**: Default output directory | Default capture mode | Timeout settings
## Features
- Chrome CDP for full JavaScript rendering
- Two capture modes: auto or wait-for-user
- Clean markdown output with metadata
- Handles login-required pages via wait mode
## Usage
```bash
# Auto mode (default) - capture when page loads
npx -y bun ${SKILL_DIR}/scripts/main.ts <url>
# Wait mode - wait for user signal before capture
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait
# Save to specific file
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md
```
## Options
| Option | Description |
|--------|-------------|
| `<url>` | URL to fetch |
| `-o <path>` | Output file path (default: auto-generated) |
| `--wait` | Wait for user signal before capturing |
| `--timeout <ms>` | Page load timeout (default: 30000) |
## Capture Modes
| Mode | Behavior | Use When |
|------|----------|----------|
| Auto (default) | Capture on network idle | Public pages, static content |
| Wait (`--wait`) | User signals when ready | Login-required, lazy loading, paywalls |
**Wait mode workflow**:
1. Run with `--wait` → script outputs "Press Enter when ready"
2. Ask user to confirm page is ready
3. Send newline to stdin to trigger capture
## Output Format
YAML front matter with `url`, `title`, `description`, `author`, `published`, `captured_at` fields, followed by converted markdown content.
## Output Directory
```
url-to-markdown/<domain>/<slug>.md
```
- `<slug>`: From page title or URL path (kebab-case, 2-6 words)
- Conflict resolution: Append timestamp `<slug>-YYYYMMDD-HHMMSS.md`
## Environment Variables
| Variable | Description |
|----------|-------------|
| `URL_CHROME_PATH` | Custom Chrome executable path |
| `URL_DATA_DIR` | Custom data directory |
| `URL_CHROME_PROFILE_DIR` | Custom Chrome profile directory |
**Troubleshooting**: Chrome not found → set `URL_CHROME_PATH`. Timeout → increase `--timeout`. Complex pages → try `--wait` mode.
## Extension Support
Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/main.ts
```typescript
import { createInterface } from "node:readline";
import { writeFile, mkdir, access } from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import { CdpConnection, getFreePort, launchChrome, waitForChromeDebugPort, waitForNetworkIdle, waitForPageLoad, autoScroll, evaluateScript, killChrome } from "./cdp.js";
import { cleanupAndExtractScript, htmlToMarkdown, createMarkdownDocument, type PageMetadata, type ConversionResult } from "./html-to-markdown.js";
import { resolveUrlToMarkdownDataDir } from "./paths.js";
import { DEFAULT_TIMEOUT_MS, CDP_CONNECT_TIMEOUT_MS, NETWORK_IDLE_TIMEOUT_MS, POST_LOAD_DELAY_MS, SCROLL_STEP_WAIT_MS, SCROLL_MAX_STEPS } from "./constants.js";
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function fileExists(filePath: string): Promise<boolean> {
try {
await access(filePath);
return true;
} catch {
return false;
}
}
interface Args {
url: string;
output?: string;
wait: boolean;
timeout: number;
}
function parseArgs(argv: string[]): Args {
const args: Args = { url: "", wait: false, timeout: DEFAULT_TIMEOUT_MS };
for (let i = 2; i < argv.length; i++) {
const arg = argv[i];
if (arg === "--wait" || arg === "-w") {
args.wait = true;
} else if (arg === "-o" || arg === "--output") {
args.output = argv[++i];
} else if (arg === "--timeout" || arg === "-t") {
args.timeout = parseInt(argv[++i], 10) || DEFAULT_TIMEOUT_MS;
} else if (!arg.startsWith("-") && !args.url) {
args.url = arg;
}
}
return args;
}
function generateSlug(title: string, url: string): string {
const text = title || new URL(url).pathname.replace(/\//g, "-");
return text
.toLowerCase()
.replace(/[^\w\s-]/g, "")
.replace(/\s+/g, "-")
.replace(/-+/g, "-")
.replace(/^-|-$/g, "")
.slice(0, 50) || "page";
}
function formatTimestamp(): string {
const now = new Date();
const pad = (n: number) => n.toString().padStart(2, "0");
return `${now.getFullYear()}${pad(now.getMonth() + 1)}${pad(now.getDate())}-${pad(now.getHours())}${pad(now.getMinutes())}${pad(now.getSeconds())}`;
}
async function generateOutputPath(url: string, title: string): Promise<string> {
const domain = new URL(url).hostname.replace(/^www\./, "");
const slug = generateSlug(title, url);
const dataDir = resolveUrlToMarkdownDataDir();
const basePath = path.join(dataDir, domain, `${slug}.md`);
if (!(await fileExists(basePath))) {
return basePath;
}
const timestampSlug = `${slug}-${formatTimestamp()}`;
return path.join(dataDir, domain, `${timestampSlug}.md`);
}
async function waitForUserSignal(): Promise<void> {
console.log("Page opened. Press Enter when ready to capture...");
const rl = createInterface({ input: process.stdin, output: process.stdout });
await new Promise<void>((resolve) => {
rl.once("line", () => { rl.close(); resolve(); });
});
}
async function captureUrl(args: Args): Promise<ConversionResult> {
const port = await getFreePort();
const chrome = await launchChrome(args.url, port, false);
let cdp: CdpConnection | null = null;
try {
const wsUrl = await waitForChromeDebugPort(port, 30_000);
cdp = await CdpConnection.connect(wsUrl, CDP_CONNECT_TIMEOUT_MS);
const targets = await cdp.send<{ targetInfos: Array<{ targetId: string; type: string; url: string }> }>("Target.getTargets");
const pageTarget = targets.targetInfos.find(t => t.type === "page" && t.url.startsWith("http"));
if (!pageTarget) throw new Error("No page target found");
const { sessionId } = await cdp.send<{ sessionId: string }>("Target.attachToTarget", { targetId: pageTarget.targetId, flatten: true });
await cdp.send("Network.enable", {}, { sessionId });
await cdp.send("Page.enable", {}, { sessionId });
if (args.wait) {
await waitForUserSignal();
} else {
console.log("Waiting for page to load...");
await Promise.race([
waitForPageLoad(cdp, sessionId, 15_000),
sleep(8_000)
]);
await waitForNetworkIdle(cdp, sessionId, NETWORK_IDLE_TIMEOUT_MS);
await sleep(POST_LOAD_DELAY_MS);
console.log("Scrolling to trigger lazy load...");
await autoScroll(cdp, sessionId, SCROLL_MAX_STEPS, SCROLL_STEP_WAIT_MS);
await sleep(POST_LOAD_DELAY_MS);
}
console.log("Capturing page content...");
const extracted = await evaluateScript<{ title: string; description?: string; author?: string; published?: string; html: string }>(
cdp, sessionId, cleanupAndExtractScript, args.timeout
);
const metadata: PageMetadata = {
url: args.url,
title: extracted.title || "",
description: extracted.description,
author: extracted.author,
published: extracted.published,
captured_at: new Date().toISOString()
};
const markdown = htmlToMarkdown(extracted.html);
return { metadata, markdown };
} finally {
if (cdp) {
try { await cdp.send("Browser.close", {}, { timeoutMs: 5_000 }); } catch {}
cdp.close();
}
killChrome(chrome);
}
}
async function main(): Promise<void> {
const args = parseArgs(process.argv);
if (!args.url) {
console.error("Usage: bun main.ts <url> [-o output.md] [--wait] [--timeout ms]");
process.exit(1);
}
try {
new URL(args.url);
} catch {
console.error(`Invalid URL: ${args.url}`);
process.exit(1);
}
console.log(`Fetching: ${args.url}`);
console.log(`Mode: ${args.wait ? "wait" : "auto"}`);
const result = await captureUrl(args);
const outputPath = args.output || await generateOutputPath(args.url, result.metadata.title);
const outputDir = path.dirname(outputPath);
await mkdir(outputDir, { recursive: true });
const document = createMarkdownDocument(result);
await writeFile(outputPath, document, "utf-8");
console.log(`Saved: ${outputPath}`);
console.log(`Title: ${result.metadata.title || "(no title)"}`);
}
main().catch((err) => {
console.error("Error:", err instanceof Error ? err.message : String(err));
process.exit(1);
});
```