apify-actorization
Convert existing projects into Apify Actors - serverless cloud programs. Actorize JavaScript/TypeScript (SDK with Actor.init/exit), Python (async context manager), or any language (CLI wrapper). Us...
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install christophacham-agent-skills-library-apify-actorization
Repository
Skill path: skills/backend-dev/apify-actorization
Convert existing projects into Apify Actors - serverless cloud programs. Actorize JavaScript/TypeScript (SDK with Actor.init/exit), Python (async context manager), or any language (CLI wrapper). Us...
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: christophacham.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install apify-actorization into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/christophacham/agent-skills-library before adding apify-actorization to shared team environments
- Use apify-actorization for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: apify-actorization
description: "Convert existing projects into Apify Actors - serverless cloud programs. Actorize JavaScript/TypeScript (SDK with Actor.init/exit), Python (async context manager), or any language (CLI wrapper). Us..."
---
# Apify Actorization
Actorization converts existing software into reusable serverless applications compatible with the Apify platform. Actors are programs packaged as Docker images that accept well-defined JSON input, perform an action, and optionally produce structured JSON output.
## Quick Start
1. Run `apify init` in project root
2. Wrap code with SDK lifecycle (see language-specific section below)
3. Configure `.actor/input_schema.json`
4. Test with `apify run --input '{"key": "value"}'`
5. Deploy with `apify push`
## When to Use This Skill
- Converting an existing project to run on Apify platform
- Adding Apify SDK integration to a project
- Wrapping a CLI tool or script as an Actor
- Migrating a Crawlee project to Apify
## Prerequisites
Verify `apify` CLI is installed:
```bash
apify --help
```
If not installed:
```bash
curl -fsSL https://apify.com/install-cli.sh | bash
# Or (Mac): brew install apify-cli
# Or (Windows): irm https://apify.com/install-cli.ps1 | iex
# Or: npm install -g apify-cli
```
Verify CLI is logged in:
```bash
apify info # Should return your username
```
If not logged in, check if `APIFY_TOKEN` environment variable is defined. If not, ask the user to generate one at https://console.apify.com/settings/integrations, then:
```bash
apify login -t $APIFY_TOKEN
```
## Actorization Checklist
Copy this checklist to track progress:
- [ ] Step 1: Analyze project (language, entry point, inputs, outputs)
- [ ] Step 2: Run `apify init` to create Actor structure
- [ ] Step 3: Apply language-specific SDK integration
- [ ] Step 4: Configure `.actor/input_schema.json`
- [ ] Step 5: Configure `.actor/output_schema.json` (if applicable)
- [ ] Step 6: Update `.actor/actor.json` metadata
- [ ] Step 7: Test locally with `apify run`
- [ ] Step 8: Deploy with `apify push`
## Step 1: Analyze the Project
Before making changes, understand the project:
1. **Identify the language** - JavaScript/TypeScript, Python, or other
2. **Find the entry point** - The main file that starts execution
3. **Identify inputs** - Command-line arguments, environment variables, config files
4. **Identify outputs** - Files, console output, API responses
5. **Check for state** - Does it need to persist data between runs?
## Step 2: Initialize Actor Structure
Run in the project root:
```bash
apify init
```
This creates:
- `.actor/actor.json` - Actor configuration and metadata
- `.actor/input_schema.json` - Input definition for the Apify Console
- `Dockerfile` (if not present) - Container image definition
## Step 3: Apply Language-Specific Changes
Choose based on your project's language:
- **JavaScript/TypeScript**: See [js-ts-actorization.md](references/js-ts-actorization.md)
- **Python**: See [python-actorization.md](references/python-actorization.md)
- **Other Languages (CLI-based)**: See [cli-actorization.md](references/cli-actorization.md)
### Quick Reference
| Language | Install | Wrap Code |
|----------|---------|-----------|
| JS/TS | `npm install apify` | `await Actor.init()` ... `await Actor.exit()` |
| Python | `pip install apify` | `async with Actor:` |
| Other | Use CLI in wrapper script | `apify actor:get-input` / `apify actor:push-data` |
## Steps 4-6: Configure Schemas
See [schemas-and-output.md](references/schemas-and-output.md) for detailed configuration of:
- Input schema (`.actor/input_schema.json`)
- Output schema (`.actor/output_schema.json`)
- Actor configuration (`.actor/actor.json`)
- State management (request queues, key-value stores)
Validate schemas against `@apify/json_schemas` npm package.
## Step 7: Test Locally
Run the actor with inline input (for JS/TS and Python actors):
```bash
apify run --input '{"startUrl": "https://example.com", "maxItems": 10}'
```
Or use an input file:
```bash
apify run --input-file ./test-input.json
```
**Important:** Always use `apify run`, not `npm start` or `python main.py`. The CLI sets up the proper environment and storage.
## Step 8: Deploy
```bash
apify push
```
This uploads and builds your actor on the Apify platform.
## Monetization (Optional)
After deploying, you can monetize your actor in the Apify Store. The recommended model is **Pay Per Event (PPE)**:
- Per result/item scraped
- Per page processed
- Per API call made
Configure PPE in the Apify Console under Actor > Monetization. Charge for events in your code with `await Actor.charge('result')`.
Other options: **Rental** (monthly subscription) or **Free** (open source).
## Pre-Deployment Checklist
- [ ] `.actor/actor.json` exists with correct name and description
- [ ] `.actor/actor.json` validates against `@apify/json_schemas` (`actor.schema.json`)
- [ ] `.actor/input_schema.json` defines all required inputs
- [ ] `.actor/input_schema.json` validates against `@apify/json_schemas` (`input.schema.json`)
- [ ] `.actor/output_schema.json` defines output structure (if applicable)
- [ ] `.actor/output_schema.json` validates against `@apify/json_schemas` (`output.schema.json`)
- [ ] `Dockerfile` is present and builds successfully
- [ ] `Actor.init()` / `Actor.exit()` wraps main code (JS/TS)
- [ ] `async with Actor:` wraps main code (Python)
- [ ] Inputs are read via `Actor.getInput()` / `Actor.get_input()`
- [ ] Outputs use `Actor.pushData()` or key-value store
- [ ] `apify run` executes successfully with test input
- [ ] `generatedBy` is set in actor.json meta section
## Apify MCP Tools
If MCP server is configured, use these tools for documentation:
- `search-apify-docs` - Search documentation
- `fetch-apify-docs` - Get full doc pages
Otherwise, the MCP Server url: `https://mcp.apify.com/?tools=docs`.
## Resources
- [Actorization Academy](https://docs.apify.com/academy/actorization) - Comprehensive guide
- [Apify SDK for JavaScript](https://docs.apify.com/sdk/js) - Full SDK reference
- [Apify SDK for Python](https://docs.apify.com/sdk/python) - Full SDK reference
- [Apify CLI Reference](https://docs.apify.com/cli) - CLI commands
- [Actor Specification](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete specification
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/js-ts-actorization.md
```markdown
# JavaScript/TypeScript Actorization
## Install the Apify SDK
```bash
npm install apify
```
## Wrap Main Code with Actor Lifecycle
```javascript
import { Actor } from 'apify';
// Initialize connection to Apify platform
await Actor.init();
// ============================================
// Your existing code goes here
// ============================================
// Example: Get input from Apify Console or API
const input = await Actor.getInput();
console.log('Input:', input);
// Example: Your crawler or processing logic
// const crawler = new PlaywrightCrawler({ ... });
// await crawler.run([input.startUrl]);
// Example: Push results to dataset
// await Actor.pushData({ result: 'data' });
// ============================================
// End of your code
// ============================================
// Graceful shutdown
await Actor.exit();
```
## Key Points
- `Actor.init()` configures storage to use Apify API when running on platform
- `Actor.exit()` handles graceful shutdown and cleanup
- Both calls must be awaited
- Local execution remains unchanged - the SDK automatically detects the environment
## Crawlee Projects
Crawlee projects require minimal changes - just wrap with Actor lifecycle:
```javascript
import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';
await Actor.init();
// Get and validate input
const input = await Actor.getInput();
const {
startUrl = 'https://example.com',
maxItems = 100,
} = input ?? {};
let itemCount = 0;
const crawler = new PlaywrightCrawler({
requestHandler: async ({ page, request, pushData }) => {
if (itemCount >= maxItems) return;
const title = await page.title();
await pushData({ url: request.url, title });
itemCount++;
},
});
await crawler.run([startUrl]);
await Actor.exit();
```
## Express/HTTP Servers
For web servers, use standby mode in actor.json:
```json
{
"actorSpecification": 1,
"name": "my-api",
"usesStandbyMode": true
}
```
Then implement readiness probe. See [standby-mode.md](../../apify-actor-development/references/standby-mode.md).
## Batch Processing Scripts
```javascript
import { Actor } from 'apify';
await Actor.init();
const input = await Actor.getInput();
const items = input.items || [];
for (const item of items) {
const result = processItem(item);
await Actor.pushData(result);
}
await Actor.exit();
```
```
### references/python-actorization.md
```markdown
# Python Actorization
## Install the Apify SDK
```bash
pip install apify
```
## Wrap Main Function with Actor Context Manager
```python
import asyncio
from apify import Actor
async def main() -> None:
async with Actor:
# ============================================
# Your existing code goes here
# ============================================
# Example: Get input from Apify Console or API
actor_input = await Actor.get_input()
print(f'Input: {actor_input}')
# Example: Your crawler or processing logic
# crawler = PlaywrightCrawler(...)
# await crawler.run([actor_input.get('startUrl')])
# Example: Push results to dataset
# await Actor.push_data({'result': 'data'})
# ============================================
# End of your code
# ============================================
if __name__ == '__main__':
asyncio.run(main())
```
## Key Points
- `async with Actor:` handles both initialization and cleanup
- Automatically manages platform event listeners and graceful shutdown
- Local execution remains unchanged - the SDK automatically detects the environment
## Crawlee Python Projects
```python
import asyncio
from apify import Actor
from crawlee.playwright_crawler import PlaywrightCrawler
async def main() -> None:
async with Actor:
# Get and validate input
actor_input = await Actor.get_input() or {}
start_url = actor_input.get('startUrl', 'https://example.com')
max_items = actor_input.get('maxItems', 100)
item_count = 0
async def request_handler(context):
nonlocal item_count
if item_count >= max_items:
return
title = await context.page.title()
await context.push_data({'url': context.request.url, 'title': title})
item_count += 1
crawler = PlaywrightCrawler(request_handler=request_handler)
await crawler.run([start_url])
if __name__ == '__main__':
asyncio.run(main())
```
## Batch Processing Scripts
```python
import asyncio
from apify import Actor
async def main() -> None:
async with Actor:
actor_input = await Actor.get_input() or {}
items = actor_input.get('items', [])
for item in items:
result = process_item(item)
await Actor.push_data(result)
if __name__ == '__main__':
asyncio.run(main())
```
```
### references/cli-actorization.md
```markdown
# CLI-Based Actorization
For languages without an SDK (Go, Rust, Java, etc.), create a wrapper script that uses the Apify CLI.
## Create Wrapper Script
Create `start.sh` in project root:
```bash
#!/bin/bash
set -e
# Get input from Apify key-value store
INPUT=$(apify actor:get-input)
# Parse input values (adjust based on your input schema)
MY_PARAM=$(echo "$INPUT" | jq -r '.myParam // "default"')
# Run your application with the input
./your-application --param "$MY_PARAM"
# If your app writes to a file, push it to key-value store
# apify actor:set-value OUTPUT --contentType application/json < output.json
# Or push structured data to dataset
# apify actor:push-data '{"result": "value"}'
```
## Update Dockerfile
Reference the [cli-start template Dockerfile](https://github.com/apify/actor-templates/blob/master/templates/cli-start/Dockerfile) which includes the `ubi` utility for installing binaries from GitHub releases.
```dockerfile
FROM apify/actor-node:20
# Install ubi for easy GitHub release installation
RUN curl --silent --location \
https://raw.githubusercontent.com/houseabsolute/ubi/master/bootstrap/bootstrap-ubi.sh | sh
# Install your CLI tool from GitHub releases (example)
# RUN ubi --project your-org/your-tool --in /usr/local/bin
# Or install apify-cli and jq manually
RUN npm install -g apify-cli
RUN apt-get update && apt-get install -y jq
# Copy your application
COPY . .
# Build your application if needed
# RUN ./build.sh
# Make start script executable
RUN chmod +x start.sh
# Run the wrapper script
CMD ["./start.sh"]
```
## Testing CLI-Based Actors
For CLI-based actors (shell wrapper scripts), you may need to test the underlying application directly with mock input, as `apify run` requires a Node.js or Python entry point.
Test your wrapper script locally:
```bash
# Set up mock input
export INPUT='{"myParam": "test-value"}'
# Run wrapper script
./start.sh
```
## CLI Commands Reference
| Command | Description |
|---------|-------------|
| `apify actor:get-input` | Get input JSON from key-value store |
| `apify actor:set-value KEY` | Store value in key-value store |
| `apify actor:push-data JSON` | Push data to dataset |
| `apify actor:get-value KEY` | Retrieve value from key-value store |
```
### references/schemas-and-output.md
```markdown
# Schemas and Output Configuration
## Input Schema
Map your application's inputs to `.actor/input_schema.json`. Validate against the JSON Schema from the `@apify/json_schemas` npm package (`input.schema.json`).
```json
{
"title": "My Actor Input",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrl": {
"title": "Start URL",
"type": "string",
"description": "The URL to start processing from",
"editor": "textfield",
"prefill": "https://example.com"
},
"maxItems": {
"title": "Max Items",
"type": "integer",
"description": "Maximum number of items to process",
"default": 100,
"minimum": 1
}
},
"required": ["startUrl"]
}
```
### Mapping Guidelines
- Command-line arguments → input schema properties
- Environment variables → input schema or Actor env vars in actor.json
- Config files → input schema with object/array types
- Flatten deeply nested structures for better UX
## Output Schema
Define output structure in `.actor/output_schema.json`. Validate against the JSON Schema from the `@apify/json_schemas` npm package (`output.schema.json`).
### For Table-Like Data (Multiple Items)
- Use `Actor.pushData()` (JS) or `Actor.push_data()` (Python)
- Each item becomes a row in the dataset
### For Single Files or Blobs
- Use key-value store: `Actor.setValue()` / `Actor.set_value()`
- Get the public URL and include it in the dataset:
```javascript
// Store file with public access
await Actor.setValue('report.pdf', pdfBuffer, { contentType: 'application/pdf' });
// Get the public URL
const storeInfo = await Actor.openKeyValueStore();
const publicUrl = `https://api.apify.com/v2/key-value-stores/${storeInfo.id}/records/report.pdf`;
// Include URL in dataset output
await Actor.pushData({ reportUrl: publicUrl });
```
### For Multiple Files with a Common Prefix (Collections)
```javascript
// Store multiple files with a prefix
for (const [name, data] of files) {
await Actor.setValue(`screenshots/${name}`, data, { contentType: 'image/png' });
}
// Files are accessible at: .../records/screenshots%2F{name}
```
## Actor Configuration (actor.json)
Configure `.actor/actor.json`. Validate against the JSON Schema from the `@apify/json_schemas` npm package (`actor.schema.json`).
```json
{
"actorSpecification": 1,
"name": "my-actor",
"title": "My Actor",
"description": "Brief description of what the actor does",
"version": "1.0.0",
"meta": {
"templateId": "ts_empty",
"generatedBy": "Claude Code with Claude Opus 4.5"
},
"input": "./input_schema.json",
"dockerfile": "../Dockerfile"
}
```
**Important:** Fill in the `generatedBy` property with the tool/model used.
## State Management
### Request Queue - For Pausable Task Processing
The request queue works for any task processing, not just web scraping. Use a dummy URL with custom `uniqueKey` and `userData` for non-URL tasks:
```javascript
const requestQueue = await Actor.openRequestQueue();
// Add tasks to the queue (works for any processing, not just URLs)
await requestQueue.addRequest({
url: 'https://placeholder.local', // Dummy URL for non-scraping tasks
uniqueKey: `task-${taskId}`, // Unique identifier for deduplication
userData: { itemId: 123, action: 'process' }, // Your custom task data
});
// Process tasks from the queue (with Crawlee)
const crawler = new BasicCrawler({
requestQueue,
requestHandler: async ({ request }) => {
const { itemId, action } = request.userData;
// Process your task using userData
await processTask(itemId, action);
},
});
await crawler.run();
// Or manually consume without Crawlee:
let request;
while ((request = await requestQueue.fetchNextRequest())) {
await processTask(request.userData);
await requestQueue.markRequestHandled(request);
}
```
### Key-Value Store - For Checkpoint State
```javascript
// Save state
await Actor.setValue('STATE', { processedCount: 100 });
// Restore state on restart
const state = await Actor.getValue('STATE') || { processedCount: 0 };
```
```