rocm_vllm_deployment
Production-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification, and functional testing with comprehensive logging and security best practices.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-rocm-vllm-deployment
Repository
Skill path: skills/alexhegit/rocm-vllm-deployment
Production-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification, and functional testing with comprehensive logging and security best practices.
Open repositoryBest for
Primary workflow: Run DevOps.
Technical facets: Full Stack, DevOps, Security, Testing.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install rocm_vllm_deployment into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding rocm_vllm_deployment to shared team environments
- Use rocm_vllm_deployment for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
--- name: rocm_vllm_deployment description: Production-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification, and functional testing with comprehensive logging and security best practices. version: 1.0.0 author: Alex He <[email protected]> timeout: 3600s platform: Linux (AMD GPU ROCm) tags: - LLM - Deployment - AMD - ROCm - Docker Compose - vLLM - Automation - EnvCheck - AutoRepair --- # ROCm vLLM Deployment Skill Production-ready automation for deploying vLLM inference services on AMD ROCm GPUs using Docker Compose. ## Features - Environment Auto-Check - Detects and repairs missing dependencies - Model Parameter Detection - Auto-reads config.json for optimal settings - VRAM Estimation - Calculates memory requirements before deployment - Secure Token Handling - Never writes tokens to compose files - **Structured Output** - All logs and test results saved per-model - **Deployment Reports** - Human-readable summary for each deployment - Health Verification - Automated health checks and functional tests - Troubleshooting Guide - Common issues and solutions ## Environment Prerequisites **Recommended (for production):** Add to `~/.bash_profile`: ```bash # HuggingFace authentication token (required for gated models) export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Model cache directory (optional) export HF_HOME="$HOME/models" # Apply changes source ~/.bash_profile ``` **Not required for testing:** The skill will proceed without these set: - **HF_TOKEN**: Optional — public models work without it; gated models fail at download with clear error - **HF_HOME**: Optional — defaults to `/root/.cache/huggingface/hub` ### Environment Variable Detection **Priority Order:** 1. **Explicit parameter** (highest) — Provided in task/request (e.g., `hf_token: "xxx"`) 2. **Environment variable** — Already set in shell or from parent process 3. **~/.bash_profile** — Source to load variables 4. **Default value** (lowest) — HF_HOME defaults to `/root/.cache/huggingface/hub` | Variable | Required | If Missing | |----------|----------|------------| | `HF_TOKEN` | **Conditional** | Continue without token (public models work; gated models fail at download with clear error) | | `HF_HOME` | No | **Warning + Default** — Use `/root/.cache/huggingface/hub` | **Philosophy:** Fail fast for configuration errors, fail at download time for authentication errors. --- ## Helper Scripts **Location:** `<skill-dir>/scripts/` ### check-env.sh Validate and load environment variables before deployment. **Usage:** ```bash # Basic check (HF_TOKEN optional, HF_HOME optional with default) ./scripts/check-env.sh # Strict mode (HF_HOME required, fails if not set) ./scripts/check-env.sh --strict # Quiet mode (minimal output, for automation) ./scripts/check-env.sh --quiet # Test with environment variables HF_TOKEN="hf_xxx" HF_HOME="/models" ./scripts/check-env.sh ``` **Exit Codes:** | Code | Meaning | |------|---------| | 0 | Environment check completed (variables loaded or defaulted) | | 2 | Critical error (e.g., cannot source ~/.bash_profile) | **Note:** This script is optional. You can also directly run `source ~/.bash_profile`. --- ### generate-report.sh Generate human-readable deployment report after successful deployment. **Usage:** ```bash ./scripts/generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used] # Example: ./scripts/generate-report.sh \ "Qwen-Qwen3-0.6B" \ "vllm-qwen3-0-6b" \ "8001" \ "✅ Success" \ "3.6" \ "1.2" ``` **Parameters:** | Parameter | Required | Description | |-----------|----------|-------------| | `model-id` | Yes | Model ID (with `/` replaced by `-`) | | `container-name` | Yes | Docker container name | | `port` | Yes | Host port for API endpoint | | `status` | Yes | Deployment status (e.g., "✅ Success") | | `model-load-time` | No | Model loading time in seconds | | `memory-used` | No | Memory consumption in GiB | **Output:** `$HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md` **Exit Codes:** | Code | Meaning | |------|---------| | 0 | Report generated successfully | | 1 | Missing required parameters | | 2 | Output directory not found | **Integration:** This script is automatically called in **Phase 7** of the deployment workflow. --- ## Input Schema | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | model_id | String | Yes | - | HuggingFace model ID | | docker_image | String | No | rocm/vllm-dev:nightly | vLLM Docker image | | tensor_parallel_size | Integer | No | 1 | Number of GPUs | | port | Integer | No | 9999 | API server port | | hf_home | String | No | `${HF_HOME}` or `/root/.cache/huggingface/hub` | Model cache directory | | hf_token | Secret | Conditional | `${HF_TOKEN}` | HuggingFace token (optional for public models, required for gated models) | | max_model_len | Integer | No | Auto-detect | Maximum sequence length | | gpu_memory_utilization | Float | No | 0.85 | GPU memory utilization | | auto_install | Boolean | No | true | Auto-install dependencies | | log_level | String | No | INFO | Logging verbosity | ## Output Structure **All deployment artifacts MUST be saved to:** ``` $HOME/vllm-compose/<model-id-slash-to-dash>/ ``` Convert model ID to directory name by replacing `/` with `-`: - `openai/gpt-oss-20b` → `$HOME/vllm-compose/openai-gpt-oss-20b/` - `Qwen/Qwen3-Coder-Next-FP8` → `$HOME/vllm-compose/Qwen-Qwen3-Coder-Next-FP8/` **Per-model directory structure:** ``` $HOME/vllm-compose/<model-id>/ ├── deployment.log # Full deployment logs (stdout + stderr) ├── test-results.json # Functional test results (JSON format) ├── docker-compose.yml # Generated Docker Compose file ├── .env # HF_TOKEN environment (chmod 600, optional) └── DEPLOYMENT_REPORT.md # Human-readable deployment summary ``` **File requirements:** - `deployment.log` — Capture ALL container logs during deployment - `test-results.json` — Save API response from functional test request - `DEPLOYMENT_REPORT.md` — Generated in Phase 7 - All three files MUST exist before marking deployment as complete ## Execution Workflow ### Phase 0: Environment Check & Auto-Repair **Step 0.1: Load Environment Variables** ```bash # Source ~/.bash_profile to load HF_HOME and HF_TOKEN source ~/.bash_profile # If HF_HOME is not defined, it defaults to /root/.cache/huggingface/hub ``` If HF_HOME is not defined in ~/.bash_profile, it defaults to `/root/.cache/huggingface/hub`. **Step 0.2: Create Output Directory** - Create: `$HOME/vllm-compose/<model-id>/` **Step 0.3: Initialize Logging** - All output → `$HOME/vllm-compose/<model-id>/deployment.log` **Step 0.4: System Checks** - Detect OS and package manager - Check Python, pip, huggingface_hub - Check Docker, docker compose - Check ROCm tools (rocm-smi/amd-smi) - Check GPU access (/dev/kfd, /dev/dri) - Check disk space (20GB minimum) ### Phase 1: Model Download **Use HF_HOME from Phase 0 (environment variable or default):** ```bash # Download model to HF_HOME huggingface-cli download <model_id> --local-dir "$HF_HOME/hub/models--<org>--<model>" # Or use snapshot_download via Python: python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='<model_id>', cache_dir='$HF_HOME')" ``` **Authentication Handling:** | Scenario | Behavior | |----------|----------| | Public model + no token | ✅ Download succeeds | | Public model + token provided | ✅ Download succeeds | | Gated model + no token | ❌ Download fails with "authentication required" error | | Gated model + invalid token | ❌ Download fails with "invalid token" error | | Gated model + valid token | ✅ Download succeeds | **On Authentication Failure:** ```bash echo "ERROR: Model download failed - authentication required" echo "This model requires a valid HF_TOKEN." echo "" echo "Please add to ~/.bash_profile:" echo " export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"" echo "Then run: source ~/.bash_profile" exit 1 ``` - Locate model path in HF cache: `$HF_HOME/hub/models--<org>--<model-name>/` - Log download progress to `deployment.log` ### Phase 2: Model Parameter Detection - Read config.json from model - Auto-detect: max_model_len, hidden_size, num_attention_heads, num_hidden_layers, vocab_size, dtype - Validate TP size divides attention heads - Estimate VRAM requirement ### Phase 3: Docker Compose Configuration **Generate files in output directory:** - **docker-compose.yml** → `$HOME/vllm-compose/<model-id>/docker-compose.yml` - Mount HF_HOME as volume (read-only for models) - NO hardcoded tokens in compose file - **.env** → `$HOME/vllm-compose/<model-id>/.env` (optional) - Contains: `HF_TOKEN=<value>` - Permissions: `chmod 600` - Only created if user explicitly requests persistent token storage **Volume mount example:** ```yaml volumes: - ${HF_HOME}:/root/.cache/huggingface/hub:ro - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri ``` **Important:** Docker Compose reads `${HF_HOME}` from the host environment at runtime. Before running docker compose, source ~/.bash_profile: `source ~/.bash_profile` ### Phase 4: Container Launch **Important:** Before deploying, pull the latest image to ensure updates: ```bash docker pull rocm/vllm-dev:nightly ``` **Note:** Default port is 9999. Before running docker compose, check if port is available: `ss -tlnp | grep :<port>`. If port is in use, specify a different port in docker-compose.yml. - Pass HF_TOKEN at runtime: HF_TOKEN=$HF_TOKEN docker compose up -d - Wait for container initialization ### Phase 5: Health Verification - Check container status - Test /health endpoint - Test /v1/models endpoint ### Phase 6: Functional Testing - Run completion test via `/v1/chat/completions` API - **Save response to:** `$HOME/vllm-compose/<model-id>/test-results.json` - Verify response contains valid completion - **Log deployment complete** → Append to `deployment.log` - **Deployment is complete only when both files exist:** - `deployment.log` - `test-results.json` ### Phase 7: Deployment Report **Generate human-readable deployment report using the helper script.** **Step 7.1: Extract Deployment Metrics** ```bash # Parse deployment.log for metrics MODEL_LOAD_TIME=$(grep -o "model loading took [0-9.]* seconds" deployment.log | grep -o '[0-9.]*' || echo "N/A") MEMORY_USED=$(grep -o "took [0-9.]* GiB memory" deployment.log | grep -o '[0-9.]*' || echo "N/A") ``` **Step 7.2: Generate Report** ```bash # Execute the report generation script <skill-dir>/scripts/generate-report.sh \ "<model-id>" \ "<container-name>" \ "<port>" \ "<status>" \ "$MODEL_LOAD_TIME" \ "$MEMORY_USED" # Example: ./scripts/generate-report.sh \ "Qwen-Qwen3-0.6B" \ "vllm-qwen3-0-6b" \ "8001" \ "✅ Success" \ "3.6" \ "1.2" ``` **Output:** `$HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md` **Report Contents:** - Output structure verification (file checklist) - Deployment summary table (health, test, metrics) - Test results (request/response preview) - Environment configuration - Quick commands for operations **Completion Criteria:** - `DEPLOYMENT_REPORT.md` exists in output directory - Report contains all required sections - All file checks show ✅ ## Security Best Practices 1. **Never commit tokens to version control** — Add `.env` to `.gitignore` 2. **Use .env files with chmod 600** — Restrict access to owner only 3. **Mask tokens in logs** — Show only first 10 chars: `${TOKEN:0:10}...` 4. **Pass tokens at runtime** — `HF_TOKEN=$HF_TOKEN docker compose up -d` 5. **Store tokens in ~/.bash_profile** — For production environments, set `HF_TOKEN` in user's shell config 6. **Set token for gated models** — HF_TOKEN is validated at download time; set in ~/.bash_profile for production ## Troubleshooting ### Environment Variables | Issue | Solution | |-------|----------| | `HF_TOKEN not set` | Add `export HF_TOKEN="hf_xxx"` to `~/.bash_profile`, then `source ~/.bash_profile`. Or provide via parameter. | | `HF_HOME not set` | defaults to `/root/.cache/huggingface/hub`. For production, add `export HF_HOME="/path"` to `~/.bash_profile`. | | `~/.bash_profile not found` | Create `~/.bash_profile` and add environment variables. | | `Changes not taking effect` | Run `source ~/.bash_profile` or restart terminal. | | `HF_TOKEN provided but download still fails` | Token may be invalid or lack access to the model. Verify token at https://huggingface.co/settings/tokens | ### Model Download | Issue | Solution | |-------|----------| | `Authentication required` (gated model) | Set `HF_TOKEN` in `~/.bash_profile` or provide via parameter. Ensure token has access to the model. | | `Model not found` | Verify model ID is correct (case-sensitive). Check model exists on HuggingFace. | | `Download timeout` | Check network connection. Large models may take time. | ### Deployment | Issue | Solution | |-------|----------| | hf CLI not found | `pip install huggingface_hub` | | Docker Compose fails | Use `docker compose` (no hyphen) | | GPU access fails | Add user to `render` group: `sudo usermod -aG render $USER` | | Port in use | Change `port` parameter | | OOM | Reduce `gpu_memory_utilization` | ## Cleanup ```bash cd $HOME/vllm-compose/<model-id> docker compose down ``` ## Status Check **Check deployment status and logs:** ```bash # View deployment directory ls -la $HOME/vllm-compose/<model-id>/ # View live logs tail -f $HOME/vllm-compose/<model-id>/deployment.log # View test results cat $HOME/vllm-compose/<model-id>/test-results.json # Check container status docker ps | grep <model-id> # Verify environment variables echo "HF_TOKEN: ${HF_TOKEN:0:10}..." echo "HF_HOME: $HF_HOME" ``` ## Quick Start (Production) **Step 1: Add environment variables to ~/.bash_profile** ```bash # Required: HuggingFace token export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Recommended: Custom model storage path (production) export HF_HOME="/data/models/huggingface" # Apply changes source ~/.bash_profile ``` **Step 2: Verify environment is ready** ```bash # Source ~/.bash_profile to load variables source ~/.bash_profile # Expected output: # === Environment Ready === # Summary: # HF_TOKEN: hf_xxxxxx... # HF_HOME: /data/models/huggingface ``` **Step 3: Run deployment** ```bash # The skill will automatically: # 1. Source ~/.bash_profile to load HF_HOME and HF_TOKEN # 2. Use HF_TOKEN and HF_HOME from environment (or ~/.bash_profile, or defaults) # 3. Proceed without token for public models # 4. Fail at download time with clear error if gated model requires token ``` ## Version History | Version | Changes | |---------|---------| | 1.0.0 | Initial release | --- ## Referenced Files > The following files are referenced in this skill and included for context. ### scripts/check-env.sh ```bash #!/bin/bash # # check-env.sh - Check and load HF_TOKEN and HF_HOME environment variables # # This script is part of the rocm_vllm_deployment skill. # It sources ~/.bashrc and validates environment variables. # # Priority Order: # 1. Environment variables already set (e.g., from parent process/parameters) # 2. ~/.bashrc (sourced by this script) # 3. Default values (HF_HOME only; HF_TOKEN remains unset) # # Usage: # ./check-env.sh [--strict] [--quiet] # # Options: # --strict Fail if HF_HOME is not set (default: warn only) # --quiet Suppress informational output # # Exit Codes: # 0 - Environment check completed (variables loaded or defaulted) # 2 - Critical error (e.g., cannot source ~/.bashrc) # # Environment Variables: # HF_TOKEN - Optional here, required only for gated models (checked at download time) # HF_HOME - Optional: Custom model cache directory (default: ~/.cache/huggingface/hub) # set -e # Parse arguments STRICT_MODE=false QUIET_MODE=false for arg in "$@"; do case $arg in --strict) STRICT_MODE=true shift ;; --quiet) QUIET_MODE=true shift ;; esac done # Helper function for output log() { if [ "$QUIET_MODE" = false ]; then echo "$@" fi } # Track source of variables HF_TOKEN_SOURCE="not set" HF_HOME_SOURCE="not set" #------------------------------------------------------------------------------ # Step 1: Check if already set in environment (from parameters/parent process) #------------------------------------------------------------------------------ log "=== Environment Check ===" log "" if [ -n "$HF_TOKEN" ]; then HF_TOKEN_SOURCE="environment/parameter" log "✓ HF_TOKEN already set in environment: ${HF_TOKEN:0:10}..." fi if [ -n "$HF_HOME" ]; then HF_HOME_SOURCE="environment/parameter" log "✓ HF_HOME already set in environment: $HF_HOME" fi #------------------------------------------------------------------------------ # Step 2: Source ~/.bashrc to load missing variables #------------------------------------------------------------------------------ if [ -f "$HOME/.bashrc" ]; then # Source .bashrc to load HF_TOKEN and HF_HOME if not already set source "$HOME/.bashrc" log "✓ Loaded ~/.bashrc" # Check if variables were loaded from .bashrc if [ -z "$HF_TOKEN_SOURCE" ] && [ -n "$HF_TOKEN" ]; then HF_TOKEN_SOURCE="~/.bashrc" log "✓ HF_TOKEN loaded from ~/.bashrc: ${HF_TOKEN:0:10}..." fi if [ -z "$HF_HOME_SOURCE" ] && [ -n "$HF_HOME" ]; then HF_HOME_SOURCE="~/.bashrc" log "✓ HF_HOME loaded from ~/.bashrc: $HF_HOME" fi else log "⚠️ WARNING: ~/.bashrc not found" log "Creating empty ~/.bashrc for future use..." touch "$HOME/.bashrc" fi log "" #------------------------------------------------------------------------------ # Step 3: Apply defaults for missing variables #------------------------------------------------------------------------------ # HF_TOKEN: No default, will fail at download time if required if [ -z "$HF_TOKEN" ]; then HF_TOKEN_SOURCE="not set (public models only)" log "⚠️ HF_TOKEN not set" log " - Public models: Will proceed without authentication" log " - Gated models: Will fail at download time with clear error" log "" log "To avoid authentication errors, add to ~/.bashrc:" log " export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"" else log "✓ HF_TOKEN: ${HF_TOKEN:0:10}... (source: $HF_TOKEN_SOURCE)" fi # HF_HOME: Default to ~/.cache/huggingface/hub if [ -z "$HF_HOME" ]; then if [ "$STRICT_MODE" = true ]; then log "❌ ERROR: HF_HOME is not set (strict mode)" log "" log "Please add the following to ~/.bashrc:" log "" log " export HF_HOME=\"/path/to/models\"" log "" exit 1 else HF_HOME_SOURCE="default" HF_HOME="$HOME/.cache/huggingface/hub" log "⚠️ HF_HOME not set, using default: $HF_HOME" log "" log "For production, consider adding to ~/.bashrc:" log " export HF_HOME=\"/data/models/huggingface\"" fi else log "✓ HF_HOME: $HF_HOME (source: $HF_HOME_SOURCE)" fi #------------------------------------------------------------------------------ # Step 4: Export for child processes #------------------------------------------------------------------------------ export HF_TOKEN export HF_HOME log "" log "=== Environment Ready ===" log "" log "Summary:" log " HF_TOKEN: ${HF_TOKEN:0:10}... ($HF_TOKEN_SOURCE)" log " HF_HOME: $HF_HOME ($HF_HOME_SOURCE)" log "" # Output for parsing (optional, for automation) if [ "$QUIET_MODE" = true ]; then echo "HF_TOKEN_SOURCE=$HF_TOKEN_SOURCE" echo "HF_HOME_SOURCE=$HF_HOME_SOURCE" fi exit 0 ``` ### scripts/generate-report.sh ```bash #!/bin/bash # # generate-report.sh - Generate deployment report for vLLM deployment # # This script is part of the rocm_vllm_deployment skill. # It creates a human-readable deployment report in markdown format. # # Usage: # ./generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used] # # Example: # ./generate-report.sh "Qwen-Qwen3-0.6B" "vllm-qwen3-0-6b" "8001" "✅ Success" "3.6" "1.2" # # Exit Codes: # 0 - Report generated successfully # 1 - Missing required parameters # 2 - Output directory not found # set -e # Parameters MODEL_ID="$1" CONTAINER_NAME="$2" PORT="$3" STATUS="$4" MODEL_LOAD_TIME="${5:-N/A}" MEMORY_USED="${6:-N/A}" # Directories SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" SKILL_DIR="$(dirname "$SCRIPT_DIR")" OUTPUT_DIR="$HOME/vllm-compose/$MODEL_ID" REPORT_FILE="$OUTPUT_DIR/DEPLOYMENT_REPORT.md" #------------------------------------------------------------------------------ # Validate parameters #------------------------------------------------------------------------------ if [ -z "$MODEL_ID" ] || [ -z "$CONTAINER_NAME" ] || [ -z "$PORT" ] || [ -z "$STATUS" ]; then echo "ERROR: Missing required parameters" echo "" echo "Usage: $0 <model-id> <container-name> <port> <status> [model-load-time] [memory-used]" echo "" echo "Example:" echo " $0 \"Qwen-Qwen3-0.6B\" \"vllm-qwen3-0-6b\" \"8001\" \"✅ Success\" \"3.6\" \"1.2\"" exit 1 fi # Check if output directory exists if [ ! -d "$OUTPUT_DIR" ]; then echo "ERROR: Output directory not found: $OUTPUT_DIR" exit 2 fi #------------------------------------------------------------------------------ # Helper functions #------------------------------------------------------------------------------ check_file() { if [ -f "$1" ]; then echo "✅" else echo "❌" fi } get_compose_value() { local key="$1" local file="$OUTPUT_DIR/docker-compose.yml" if [ -f "$file" ]; then # Try JSON array format first: "--key", "value" local val=$(grep -o "\"--$key\", \"[^\"]*\"" "$file" 2>/dev/null | sed 's/.*"\([^"]*\)"$/\1/' | head -1) if [ -n "$val" ]; then echo "$val" return fi # Try YAML format: key: value val=$(grep -o "$key: [^ ]*" "$file" 2>/dev/null | cut -d' ' -f2 | head -1) if [ -n "$val" ]; then echo "$val" return fi echo "N/A" else echo "N/A" fi } #------------------------------------------------------------------------------ # Read test results if available #------------------------------------------------------------------------------ TEST_RESULTS_FILE="$OUTPUT_DIR/test-results.json" if [ -f "$TEST_RESULTS_FILE" ]; then TEST_RESPONSE=$(cat "$TEST_RESULTS_FILE" | head -c 2000) PROMPT_TOKENS=$(cat "$TEST_RESULTS_FILE" | grep -o '"prompt_tokens":[0-9]*' | grep -o '[0-9]*' || echo "N/A") COMPLETION_TOKENS=$(cat "$TEST_RESULTS_FILE" | grep -o '"completion_tokens":[0-9]*' | grep -o '[0-9]*' || echo "N/A") TOTAL_TOKENS=$(cat "$TEST_RESULTS_FILE" | grep -o '"total_tokens":[0-9]*' | grep -o '[0-9]*' || echo "N/A") TEST_STATUS="✅ PASSED" else TEST_RESPONSE="Test results not available" PROMPT_TOKENS="N/A" COMPLETION_TOKENS="N/A" TOTAL_TOKENS="N/A" TEST_STATUS="❌ FAILED" fi #------------------------------------------------------------------------------ # Get deployment info #------------------------------------------------------------------------------ # Check health status from deployment.log HEALTH_STATUS="⚠️ Not recorded" if [ -f "$OUTPUT_DIR/deployment.log" ]; then if grep -q "Health OK" "$OUTPUT_DIR/deployment.log" 2>/dev/null; then HEALTH_STATUS="✅ PASSED" elif grep -q "health" "$OUTPUT_DIR/deployment.log" 2>/dev/null; then HEALTH_STATUS="⚠️ In progress" fi fi # Get max model len MAX_MODEL_LEN=$(get_compose_value "max-model-len") # Get tensor parallel size TP_SIZE=$(get_compose_value "tensor-parallel-size") # Get GPU memory utilization GPU_MEM_UTIL=$(get_compose_value "gpu-memory-utilization") # Get Docker image DOCKER_IMAGE="N/A" if [ -f "$OUTPUT_DIR/docker-compose.yml" ]; then DOCKER_IMAGE=$(grep -o 'image: [^ ]*' "$OUTPUT_DIR/docker-compose.yml" 2>/dev/null | cut -d' ' -f2 || echo "N/A") fi # Get environment info HF_TOKEN_STATUS="Not set" if [ -n "$HF_TOKEN" ]; then HF_TOKEN_STATUS="Set (${HF_TOKEN:0:10}...)" fi HF_HOME_STATUS="${HF_HOME:-default: ~/.cache/huggingface/hub}" #------------------------------------------------------------------------------ # Generate report #------------------------------------------------------------------------------ cat > "$REPORT_FILE" << EOF # Deployment Report | | | |---|---| | **Model** | $MODEL_ID | | **Status** | $STATUS | | **Timestamp** | $(date '+%Y-%m-%d %H:%M:%S %Z') | | **Container** | $CONTAINER_NAME | --- ## 📁 Output Structure \`\`\` $OUTPUT_DIR/ ├── deployment.log $(check_file "$OUTPUT_DIR/deployment.log") ├── test-results.json $(check_file "$OUTPUT_DIR/test-results.json") ├── docker-compose.yml $(check_file "$OUTPUT_DIR/docker-compose.yml") └── DEPLOYMENT_REPORT.md ✅ \`\`\` --- ## 📊 Deployment Summary | Metric | Value | |--------|-------| | **Health Check** | $HEALTH_STATUS | | **Functional Test** | $TEST_STATUS | | **Model Load Time** | $MODEL_LOAD_TIME seconds | | **Memory Used** | $MEMORY_USED GiB | | **Max Context Length** | $MAX_MODEL_LEN tokens | | **Tensor Parallel Size** | $TP_SIZE | | **GPU Memory Utilization** | ${GPU_MEM_UTIL}% | --- ## 🧪 Test Results **Endpoint:** \`http://localhost:$PORT/v1/chat/completions\` **Request:** \`\`\`json { "model": "$MODEL_ID", "messages": [{"role": "user", "content": "<test_prompt>"}], "max_tokens": <X> } \`\`\` **Response:** \`\`\`json $TEST_RESPONSE \`\`\` **Token Usage:** - Prompt Tokens: $PROMPT_TOKENS - Completion Tokens: $COMPLETION_TOKENS - Total Tokens: $TOTAL_TOKENS --- ## 🔧 Environment | Variable | Value | |----------|-------| | **HF_TOKEN** | $HF_TOKEN_STATUS | | **HF_HOME** | $HF_HOME_STATUS | | **Docker Image** | $DOCKER_IMAGE | | **Port Mapping** | $PORT:8000 | --- ## 🚀 Quick Commands \`\`\`bash # View live logs tail -f $OUTPUT_DIR/deployment.log # Test endpoint curl http://localhost:$PORT/v1/chat/completions \\ -H "Content-Type: application/json" \\ -d '{"model":"$MODEL_ID","messages":[{"role":"user","content":"Hello"}],"max_tokens":50}' # Check container status docker ps | grep $CONTAINER_NAME # Stop container cd $OUTPUT_DIR && docker compose down \`\`\` --- ## 📝 Notes - Report generated by: generate-report.sh - Skill: rocm_vllm_deployment - Skill directory: $SKILL_DIR - Generated at: $(date '+%Y-%m-%d %H:%M:%S %Z') EOF echo "✅ Report generated: $REPORT_FILE" exit 0 ``` --- ## Skill Companion Files > Additional files collected from the skill directory layout. ### _meta.json ```json { "owner": "alexhegit", "slug": "rocm-vllm-deployment", "displayName": "ROCm vLLM Deployment", "latest": { "version": "1.0.0", "publishedAt": 1772357570473, "commit": "https://github.com/openclaw/skills/commit/ff2d5f29eb0777bb92e5f866235bc1c257a22e10" }, "history": [] } ```