SkillHub ClubWrite Technical DocsFull StackData / AITech Writer

crawl-for-ai

Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,132

Hot score

Updated

March 20, 2026

Overall rating

C5.0

Composite score

5.0

Best-practice grade

A92.0

Install command

npx @skill-hub/cli install openclaw-skills-crawl-for-ai

Repository

openclaw/skills

Skill path: skills/angusthefuzz/crawl-for-ai

Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, Data / AI, Tech Writer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install crawl-for-ai into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding crawl-for-ai to shared team environments
Use crawl-for-ai for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: crawl-for-ai
description: Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.
version: 1.0.1
author: Ania
requiresEnv:
  - CRAWL4AI_URL
metadata:
  clawdbot:
    emoji: "🕷️"
    requires:
      bins: ["node"]
---

# Crawl4AI Web Scraper

Local Crawl4AI instance for full web page extraction with JavaScript rendering.

## Endpoints

**Proxy (port 11234)** — Clean output, OpenWebUI-compatible
- Returns: `[{page_content, metadata}]`
- Use for: Simple content extraction

**Direct (port 11235)** — Full output with all data
- Returns: `{results: [{markdown, html, links, media, ...}]}`
- Use for: When you need links, media, or other metadata

## Usage

```bash
# Via script
node {baseDir}/scripts/crawl4ai.js "url"
node {baseDir}/scripts/crawl4ai.js "url" --json
```

**Script options:**
- `--json` — Full JSON response

**Output:** Clean markdown from the page.

## Configuration

**Required environment variable:**

- `CRAWL4AI_URL` — Your Crawl4AI instance URL (e.g., `http://localhost:11235`)

**Optional:**

- `CRAWL4AI_KEY` — API key if your instance requires authentication

## Features

- **JavaScript rendering** — Handles dynamic content
- **Unlimited usage** — Local instance, no API limits
- **Full content** — HTML, markdown, links, media, tables
- **Better than Tavily** for complex pages with JS

## API

Uses your local Crawl4AI instance REST API. Auth header only sent if `CRAWL4AI_KEY` is set.

---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "angusthefuzz",
  "slug": "crawl-for-ai",
  "displayName": "Crawl4AI Web Scraper",
  "latest": {
    "version": "1.0.1",
    "publishedAt": 1771098491812,
    "commit": "https://github.com/openclaw/skills/commit/45a729385ee411d9c6cdd590827012fb75d591c7"
  },
  "history": []
}

```

### scripts/crawl4ai.js

```javascript
#!/usr/bin/env node
/**
 * Crawl4AI Fetcher - Local crawl4ai instance for web scraping
 * 
 * Usage: node crawl4ai.js "url" [--json]
 * 
 * Output: Markdown content from the page
 */

const http = require('http');

const CRAWL4AI_URL = process.env.CRAWL4AI_URL;
const CRAWL4AI_KEY = process.env.CRAWL4AI_KEY;

if (!CRAWL4AI_URL) {
  console.error('CRAWL4AI_URL environment variable is required');
  process.exit(1);
}

// Parse URL to get hostname and port
let hostname, port, basePath;
try {
  const parsed = new URL(CRAWL4AI_URL);
  hostname = parsed.hostname;
  port = parsed.port || 11235;
  basePath = parsed.pathname.replace(/\/$/, ''); // Remove trailing slash
} catch (e) {
  console.error('Invalid CRAWL4AI_URL:', CRAWL4AI_URL);
  process.exit(1);
}

async function crawl(urls) {
  return new Promise((resolve, reject) => {
    const body = JSON.stringify({ urls: Array.isArray(urls) ? urls : [urls] });
    
    const headers = {
      'Content-Type': 'application/json',
      'Content-Length': Buffer.byteLength(body)
    };
    if (CRAWL4AI_KEY) {
      headers['Authorization'] = `Bearer ${CRAWL4AI_KEY}`;
    }
    
    const req = http.request({
      hostname,
      port,
      path: basePath + '/crawl',
      method: 'POST',
      headers
    }, (res) => {
      let data = '';
      res.on('data', chunk => data += chunk);
      res.on('end', () => {
        try {
          resolve(JSON.parse(data));
        } catch (e) {
          reject(e);
        }
      });
    });
    
    req.on('error', reject);
    req.write(body);
    req.end();
  });
}

async function main() {
  const args = process.argv.slice(2);
  
  if (args.length === 0 || args[0] === '--help' || args[0] === '-h') {
    console.log('Crawl4AI Fetcher - Local web scraper');
    console.log('');
    console.log('Usage: node crawl4ai.js "url" [--json]');
    console.log('');
    console.log('Options:');
    console.log('  --json    Output full JSON response');
    console.log('');
    console.log('Environment:');
    console.log('  CRAWL4AI_URL  Crawl4AI instance URL (default: http://localhost:11235)');
    console.log('  CRAWL4AI_KEY  API key (default: 1234)');
    process.exit(0);
  }
  
  const url = args.find(a => !a.startsWith('--'));
  const jsonOutput = args.includes('--json');
  
  try {
    const result = await crawl(url);
    
    if (!result.success) {
      console.error('Crawl failed:', result);
      process.exit(1);
    }
    
    const page = result.results[0];
    
    if (jsonOutput) {
      console.log(JSON.stringify(result, null, 2));
    } else {
      // Output clean markdown
      const md = page.markdown?.raw_markdown || page.markdown || '';
      console.log(md);
    }
  } catch (err) {
    console.error('Error:', err.message);
    process.exit(1);
  }
}

main();
```