apify-actor
Build and deploy Apify actors for web scraping and automation. Use for serverless scraping, data extraction, browser automation, and API integrations with Python.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install majiayu000-claude-skill-registry-apify-actor
Repository
Skill path: skills/development/apify-actor
Build and deploy Apify actors for web scraping and automation. Use for serverless scraping, data extraction, browser automation, and API integrations with Python.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Backend, DevOps, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: majiayu000.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install apify-actor into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/majiayu000/claude-skill-registry before adding apify-actor to shared team environments
- Use apify-actor for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: apify-actor
description: Build and deploy Apify actors for web scraping and automation. Use for serverless scraping, data extraction, browser automation, and API integrations with Python.
---
# Apify Actor Development
Build serverless Apify actors for web scraping, browser automation, and data extraction using Python.
## Prerequisites & Setup (MANDATORY)
Before creating or modifying actors, verify that `apify` CLI is installed:
Run `apify --help`.
If it is not installed, you can run:
```bash
curl -fsSL https://apify.com/install-cli.sh | bash
# Or (Mac): brew install apify-cli
# Or (Windows): irm https://apify.com/install-cli.ps1 | iex
# Or: npm install -g apify-cli
```
When the apify CLI is installed, check that it is logged in with:
```bash
apify info # Should return your username
```
If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).
Then run:
```bash
apify login -t $APIFY_TOKEN
```
## Quick Start Workflow
### Creating a New Actor
1. **Copy template** - Copy all files including hidden ones from the skill's `assets/python-template/` directory to your new actor directory. The template is located at `{base_dir}/assets/python-template/` where `{base_dir}` is the skill's base directory.
2. **Setup pre-commit** - Run `uv run pre-commit install` for automatic quality checks
3. **Add dependencies** - Use `uv add package-name` for each required dependency
4. **Implement logic** - Write the actor code in `src/main.py` (the `src/__main__.py` entry point is already set up)
5. **Configure schemas** - Update input/output schemas in `.actor/input_schema.json` and `.actor/output_schema.json`
6. **Configure platform settings** - Update `.actor/actor.json` with actor metadata
7. **Write documentation** - Create comprehensive `.actor/ACTOR.md` for the marketplace
8. **Test locally** - Run `apify run` to verify functionality
9. **Deploy** - Run `apify push` to deploy the actor on the Apify platform
**CRITICAL REMINDERS:**
- NEVER create `requirements.txt`
- NEVER use `pip install` or `uv pip install`
- ALWAYS use `uv add` to add dependencies
- ALWAYS use `uv sync` to install dependencies
- ALWAYS format with `uv run ruff format .` after file changes
- ALWAYS lint with `uv run ruff check --fix .` after file changes
- ALWAYS check the `apify push` output for build errors before considering deployment complete
- Input/output schemas should be updated when changing actor functionality
## Core Concepts
### Input/Output Pattern
Every actor follows this pattern:
1. **Input**: JSON from key-value store (defined by input schema)
2. **Process**: Actor logic extracts/transforms data
3. **Output**: Results pushed to dataset or key-value store
### Storage Types
- **Dataset**: Structured data (arrays of objects) - use for scraping results and tabular data
- **Key-Value Store**: Arbitrary data (files, objects) - use for screenshots, PDFs, state, and binary files
- **Request Queue**: URLs to crawl - use for deep web crawling and multi-page scraping workflows
### Project Structure
```
my-actor/
├── .actor/
│ ├── actor.json # Actor metadata
│ ├── input_schema.json # Input schema
│ ├── output_schema.json # Output schema
│ ├── ACTOR.md # PUBLIC marketplace documentation (CRITICAL)
│ └── datasets/
│ └── dataset_schema.json # Dataset schema with views
├── src/ or package_name/ # Source code
│ ├── __init__.py
│ ├── __main__.py # Entry point for CLI (REQUIRED)
│ └── main.py # Main actor logic
├── tests/ # Test files
│ └── test_*.py
├── .dockerignore # Docker build exclusions
├── .pre-commit-config.yaml # Pre-commit hooks
├── Dockerfile # Container config
├── pyproject.toml # Python project config
├── uv.lock # Dependency lock file
└── README.md # Development docs
```
## Common Patterns
See `references/python-sdk.md` for complete examples of:
- Simple HTTP scraping with BeautifulSoup
- Browser automation with Playwright and Selenium
- Deep crawling with Request Queue
- Proxy management and error handling
- Storage APIs (Dataset, Key-Value Store, Request Queue)
## Input Schema Design
Input schemas use JSON Schema format to define and validate actor inputs. See `references/input-schema.md` for:
- Field types (string, number, boolean, array, object)
- Special editors (requestListSources, globs, pseudoUrls, proxy, json, textarea)
- Validation patterns (regex, length, range, required fields)
- Complete examples with best practices
**Key principles:**
- Always include descriptions and examples
- Provide examples for all fields
- Set sensible defaults for ease of use
- Use appropriate editors for better UX
- Add units for numeric fields (pages, seconds, MB)
## Output Schema Design
Output schemas define where actors store outputs and provide templates for accessing that data. See `references/output-schema.md` for:
- Schema structure and template variables (links.apiDefaultDatasetUrl, links.apiDefaultKeyValueStoreUrl, etc.)
- Dataset and key-value store output configurations
- Multiple output types in a single actor
- Integration with Python code
- Complete examples with emojis and descriptions
**Key principles:**
- Define all outputs explicitly (even if empty)
- Use descriptive titles with emojis for visual clarity
- Include helpful descriptions for users and LLM integrations
- Match templates to actual storage locations in code
## ACTOR.md Documentation (CRITICAL)
The `.actor/ACTOR.md` file is **the public-facing documentation** that users see in the Apify marketplace. This is your actor's main sales page and user guide.
**Required sections:**
1. **Title & Description** - Clear, compelling one-liner
2. **What it does** - Bullet points of key capabilities
3. **Input** - Example JSON with field explanations
4. **Output** - Example JSON showing expected results
5. **Use Cases** - Who benefits and why (with emojis)
6. **Standby Mode** (if applicable) - API usage examples
7. **Tips & Best Practices** - Performance and configuration guidance
See `assets/python-template/.actor/ACTOR.md` for a complete template.
**Key principles:**
- Write for non-technical users - assume no coding knowledge
- Use emojis to make sections scannable (🎯 🔍 ⚡ 🚀)
- Provide copy-paste ready code examples
- Show actual input/output samples, not schemas
- Highlight benefits and use cases clearly
## Modifying Existing Actors
When modifying an existing actor:
1. **Understand current logic** - Read `src/main.py`
2. **Check input schema** - Review `.actor/input_schema.json` for expected inputs
3. **Add dependencies with uv** - Use `uv add package-name` (NEVER pip install)
4. **Make code changes** - Implement the requested features
5. **Format code** - Run `uv run ruff format .` (MANDATORY)
6. **Lint code** - Run `uv run ruff check --fix .` (MANDATORY)
7. **Test changes locally** - Use `apify run` before deploying
8. **Update schema if needed** - Add new fields to input schema
9. **Deploy** - Push changes with `apify push`
## Debugging Actors
1. **Test locally** - Use `apify run` to test actor locally before deployment
2. **Check storage** - Inspect `./storage/` directory for datasets, key-value stores, and request queues
3. **Add logging** - Use `Actor.log.info()`, `Actor.log.debug()`, `Actor.log.error()` (see SDK references)
4. **View logs on platform** - Check actor run logs in Apify Console for production issues
## Best Practices
### Code Quality
- **Validate input** - Always check required fields and formats with clear error messages
- **Handle errors** - Use try/catch with proper error logging and graceful degradation
- **Structured logging** - Use Actor.log with extra fields for better debugging
- **Type hints** - Add type annotations for better code clarity and IDE support
- **Docstrings** - Document functions and modules for maintainability
- **Format with ruff** - ALWAYS run `uv run ruff format .` before committing
- **Lint with ruff** - ALWAYS run `uv run ruff check --fix .` before deploying
### Performance & Scalability
- **Batch processing** - Push data in batches (100-1000 items) for large datasets to reduce API calls
- **Use proxies** - Avoid IP blocking for web scraping with proxy configuration
- **Resource limits** - Set appropriate memory limits and timeouts in `.actor/actor.json`
- **Optimize Docker** - Use multi-stage builds, bytecode compilation, and minimal base images
- **Consider Standby mode** - For low-latency (<100ms), high-frequency use cases
### Security & Configuration
- **Environment variables** - Never hardcode secrets; use `Actor.config` and environment variables
- **Input validation** - Use JSON Schema patterns, required fields, and runtime validation
- **Run as non-root** - Use `myuser` in Dockerfile for container security
- **Minimize image size** - Use `.dockerignore` to exclude unnecessary files and reduce build time
### Development Workflow
- **Testing** - Write tests with pytest; use coverage and snapshot testing for reliability
- **Pre-commit hooks** - Use ruff and pre-commit for consistent code quality (MANDATORY)
- **Use uv exclusively** - NEVER use pip or requirements.txt; only use `uv add` and `uv sync` (MANDATORY)
- **Lock dependencies** - Always commit `uv.lock` for reproducible builds (MANDATORY)
- **Test locally** - Always test with `apify run` before deploying to catch issues early
- **Dataset schemas** - Define `dataset_schema.json` with views for better Apify Console UI
- **CLI support** - Add CLI entry points via `__main__.py` for local testing and development
## Standby Mode (Real-time API)
Standby mode allows actors to run as persistent HTTP servers, providing instant responses without cold start delays.
**Perfect for:**
- Real-time APIs requiring <100ms response times
- Webhook endpoints that need immediate processing
- High-frequency requests (multiple requests per second)
- Integration with real-time services (Slack bots, chat applications, webhooks)
- Low-latency scraping APIs and on-demand data extraction
See `references/standby-mode.md` for complete implementation patterns, authentication, and examples.
## References
Detailed documentation in `references/`:
- `python-sdk.md` - SDK patterns and complete code examples
- `standby-mode.md` - Real-time API implementation
- `input-schema.md` - Input validation and UI configuration
- `output-schema.md` - Output configuration and templates
## Troubleshooting
If you need information not covered in this skill, use the WebFetch tool with https://docs.apify.com/llms.txt to access the complete official documentation.