crawl-for-ai
Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-crawl-for-ai
Repository
Skill path: skills/angusthefuzz/crawl-for-ai
Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Data / AI, Tech Writer.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install crawl-for-ai into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding crawl-for-ai to shared team environments
- Use crawl-for-ai for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: crawl-for-ai
description: Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.
version: 1.0.1
author: Ania
requiresEnv:
- CRAWL4AI_URL
metadata:
clawdbot:
emoji: "π·οΈ"
requires:
bins: ["node"]
---
# Crawl4AI Web Scraper
Local Crawl4AI instance for full web page extraction with JavaScript rendering.
## Endpoints
**Proxy (port 11234)** β Clean output, OpenWebUI-compatible
- Returns: `[{page_content, metadata}]`
- Use for: Simple content extraction
**Direct (port 11235)** β Full output with all data
- Returns: `{results: [{markdown, html, links, media, ...}]}`
- Use for: When you need links, media, or other metadata
## Usage
```bash
# Via script
node {baseDir}/scripts/crawl4ai.js "url"
node {baseDir}/scripts/crawl4ai.js "url" --json
```
**Script options:**
- `--json` β Full JSON response
**Output:** Clean markdown from the page.
## Configuration
**Required environment variable:**
- `CRAWL4AI_URL` β Your Crawl4AI instance URL (e.g., `http://localhost:11235`)
**Optional:**
- `CRAWL4AI_KEY` β API key if your instance requires authentication
## Features
- **JavaScript rendering** β Handles dynamic content
- **Unlimited usage** β Local instance, no API limits
- **Full content** β HTML, markdown, links, media, tables
- **Better than Tavily** for complex pages with JS
## API
Uses your local Crawl4AI instance REST API. Auth header only sent if `CRAWL4AI_KEY` is set.
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "angusthefuzz",
"slug": "crawl-for-ai",
"displayName": "Crawl4AI Web Scraper",
"latest": {
"version": "1.0.1",
"publishedAt": 1771098491812,
"commit": "https://github.com/openclaw/skills/commit/45a729385ee411d9c6cdd590827012fb75d591c7"
},
"history": []
}
```
### scripts/crawl4ai.js
```javascript
#!/usr/bin/env node
/**
* Crawl4AI Fetcher - Local crawl4ai instance for web scraping
*
* Usage: node crawl4ai.js "url" [--json]
*
* Output: Markdown content from the page
*/
const http = require('http');
const CRAWL4AI_URL = process.env.CRAWL4AI_URL;
const CRAWL4AI_KEY = process.env.CRAWL4AI_KEY;
if (!CRAWL4AI_URL) {
console.error('CRAWL4AI_URL environment variable is required');
process.exit(1);
}
// Parse URL to get hostname and port
let hostname, port, basePath;
try {
const parsed = new URL(CRAWL4AI_URL);
hostname = parsed.hostname;
port = parsed.port || 11235;
basePath = parsed.pathname.replace(/\/$/, ''); // Remove trailing slash
} catch (e) {
console.error('Invalid CRAWL4AI_URL:', CRAWL4AI_URL);
process.exit(1);
}
async function crawl(urls) {
return new Promise((resolve, reject) => {
const body = JSON.stringify({ urls: Array.isArray(urls) ? urls : [urls] });
const headers = {
'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(body)
};
if (CRAWL4AI_KEY) {
headers['Authorization'] = `Bearer ${CRAWL4AI_KEY}`;
}
const req = http.request({
hostname,
port,
path: basePath + '/crawl',
method: 'POST',
headers
}, (res) => {
let data = '';
res.on('data', chunk => data += chunk);
res.on('end', () => {
try {
resolve(JSON.parse(data));
} catch (e) {
reject(e);
}
});
});
req.on('error', reject);
req.write(body);
req.end();
});
}
async function main() {
const args = process.argv.slice(2);
if (args.length === 0 || args[0] === '--help' || args[0] === '-h') {
console.log('Crawl4AI Fetcher - Local web scraper');
console.log('');
console.log('Usage: node crawl4ai.js "url" [--json]');
console.log('');
console.log('Options:');
console.log(' --json Output full JSON response');
console.log('');
console.log('Environment:');
console.log(' CRAWL4AI_URL Crawl4AI instance URL (default: http://localhost:11235)');
console.log(' CRAWL4AI_KEY API key (default: 1234)');
process.exit(0);
}
const url = args.find(a => !a.startsWith('--'));
const jsonOutput = args.includes('--json');
try {
const result = await crawl(url);
if (!result.success) {
console.error('Crawl failed:', result);
process.exit(1);
}
const page = result.results[0];
if (jsonOutput) {
console.log(JSON.stringify(result, null, 2));
} else {
// Output clean markdown
const md = page.markdown?.raw_markdown || page.markdown || '';
console.log(md);
}
} catch (err) {
console.error('Error:', err.message);
process.exit(1);
}
}
main();
```