SkillHub ClubWrite Technical DocsFull StackDevOpsTech Writer

customize

Interactive guided deployment flow for Azure OpenAI models with full customization control. Step-by-step selection of model version, SKU (GlobalStandard/Standard/ProvisionedManaged), capacity, RAI policy (content filter), and advanced options (dynamic quota, priority processing, spillover). USE FOR: custom deployment, customize model deployment, choose version, select SKU, set capacity, configure content filter, RAI policy, deployment options, detailed deployment, advanced deployment, PTU deployment, provisioned throughput. DO NOT USE FOR: quick deployment to optimal region (use preset).

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

420

Hot score

Updated

March 20, 2026

Overall rating

C3.5

Composite score

3.5

Best-practice grade

B73.6

Install command

npx @skill-hub/cli install microsoft-azure-skills-customize

Repository

microsoft/azure-skills

Skill path: .github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, DevOps, Tech Writer.

Target audience: everyone.

License: MIT.

Original source

Catalog source: SkillHub Club.

Repository owner: microsoft.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install customize into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/microsoft/azure-skills before adding customize to shared team environments
Use customize for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: customize
description: "Interactive guided deployment flow for Azure OpenAI models with full customization control. Step-by-step selection of model version, SKU (GlobalStandard/Standard/ProvisionedManaged), capacity, RAI policy (content filter), and advanced options (dynamic quota, priority processing, spillover). USE FOR: custom deployment, customize model deployment, choose version, select SKU, set capacity, configure content filter, RAI policy, deployment options, detailed deployment, advanced deployment, PTU deployment, provisioned throughput. DO NOT USE FOR: quick deployment to optimal region (use preset)."
license: MIT
metadata:
  author: Microsoft
  version: "1.0.1"
---

# Customize Model Deployment

Interactive guided workflow for deploying Azure OpenAI models with full customization control over version, SKU, capacity, content filtering, and advanced options.

## Quick Reference

| Property | Description |
|----------|-------------|
| **Flow** | Interactive step-by-step guided deployment |
| **Customization** | Version, SKU, Capacity, RAI Policy, Advanced Options |
| **SKU Support** | GlobalStandard, Standard, ProvisionedManaged, DataZoneStandard |
| **Best For** | Precise control over deployment configuration |
| **Authentication** | Azure CLI (`az login`) |
| **Tools** | Azure CLI, MCP tools (optional) |

## When to Use This Skill

Use this skill when you need **precise control** over deployment configuration:

- ✅ **Choose specific model version** (not just latest)
- ✅ **Select deployment SKU** (GlobalStandard vs Standard vs PTU)
- ✅ **Set exact capacity** within available range
- ✅ **Configure content filtering** (RAI policy selection)
- ✅ **Enable advanced features** (dynamic quota, priority processing, spillover)
- ✅ **PTU deployments** (Provisioned Throughput Units)

**Alternative:** Use `preset` for quick deployment to the best available region with automatic configuration.

### Comparison: customize vs preset

| Feature | customize | preset |
|---------|---------------------|----------------------------|
| **Focus** | Full customization control | Optimal region selection |
| **Version Selection** | User chooses from available | Uses latest automatically |
| **SKU Selection** | User chooses (GlobalStandard/Standard/PTU) | GlobalStandard only |
| **Capacity** | User specifies exact value | Auto-calculated (50% of available) |
| **RAI Policy** | User selects from options | Default policy only |
| **Region** | Current region first, falls back to all regions if no capacity | Checks capacity across all regions upfront |
| **Use Case** | Precise deployment requirements | Quick deployment to best region |

## Prerequisites

- Azure subscription with Cognitive Services Contributor or Owner role
- Azure AI Foundry project resource ID (format: `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`)
- Azure CLI installed and authenticated (`az login`)
- Optional: Set `PROJECT_RESOURCE_ID` environment variable

## Workflow Overview

### Complete Flow (14 Phases)

```
1. Verify Authentication
2. Get Project Resource ID
3. Verify Project Exists
4. Get Model Name (if not provided)
5. List Model Versions → User Selects
6. List SKUs for Version → User Selects
7. Get Capacity Range → User Configures
   7b. If no capacity: Cross-Region Fallback → Query all regions → User selects region/project
8. List RAI Policies → User Selects
9. Configure Advanced Options (if applicable)
10. Configure Version Upgrade Policy
11. Generate Deployment Name
12. Review Configuration
13. Execute Deployment & Monitor
```

### Fast Path (Defaults)

If user accepts all defaults (latest version, GlobalStandard SKU, recommended capacity, default RAI policy, standard upgrade policy), deployment completes in ~5 interactions.

---

## Phase Summaries

> ⚠️ **MUST READ:** Before executing any phase, load [references/customize-workflow.md](references/customize-workflow.md) for the full scripts and implementation details. The summaries below describe *what* each phase does — the reference file contains the *how* (CLI commands, quota patterns, capacity formulas, cross-region fallback logic).

| Phase | Action | Key Details |
|-------|--------|-------------|
| **1. Verify Auth** | Check `az account show`; prompt `az login` if needed | Verify correct subscription is active |
| **2. Get Project ID** | Read `PROJECT_RESOURCE_ID` env var or prompt user | ARM resource ID format required |
| **3. Verify Project** | Parse resource ID, call `az cognitiveservices account show` | Extracts subscription, RG, account, project, region |
| **4. Get Model** | List models via `az cognitiveservices account list-models` | User selects from available or enters custom name |
| **5. Select Version** | Query versions for chosen model | Recommend latest; user picks from list |
| **6. Select SKU** | Query model catalog + subscription quota, show only deployable SKUs | ⚠️ Never hardcode SKU lists — always query live data |
| **7. Configure Capacity** | Query capacity API, validate min/max/step, user enters value | Cross-region fallback if no capacity in current region |
| **8. Select RAI Policy** | Present content filter options | Default: `Microsoft.DefaultV2` |
| **9. Advanced Options** | Dynamic quota (GlobalStandard), priority processing (PTU), spillover | SKU-dependent availability |
| **10. Upgrade Policy** | Choose: OnceNewDefaultVersionAvailable / OnceCurrentVersionExpired / NoAutoUpgrade | Default: auto-upgrade on new default |
| **11. Deployment Name** | Auto-generate unique name, allow custom override | Validates format: `^[\w.-]{2,64}$` |
| **12. Review** | Display full config summary, confirm before proceeding | User approves or cancels |
| **13. Deploy & Monitor** | `az cognitiveservices account deployment create`, poll status | Timeout after 5 min; show endpoint + portal link |


---

## Error Handling

### Common Issues and Resolutions

| Error | Cause | Resolution |
|-------|-------|------------|
| **Model not found** | Invalid model name | List available models with `az cognitiveservices account list-models` |
| **Version not available** | Version not supported for SKU | Select different version or SKU |
| **Insufficient quota** | Capacity > available quota | Skill auto-searches all regions; fails only if no region has quota |
| **SKU not supported** | SKU not available in region | Cross-region fallback searches other regions automatically |
| **Capacity out of range** | Invalid capacity value | **PREVENTED**: Skill validates min/max/step at input (Phase 7) |
| **Deployment name exists** | Name conflict | Auto-incremented name generation |
| **Authentication failed** | Not logged in | Run `az login` |
| **Permission denied** | Insufficient permissions | Assign Cognitive Services Contributor role |
| **Capacity query fails** | API/permissions/network error | **DEPLOYMENT BLOCKED**: Will not proceed without valid quota data |

### Troubleshooting Commands

```bash
# Check deployment status
az cognitiveservices account deployment show --name <account> --resource-group <rg> --deployment-name <name>

# List all deployments
az cognitiveservices account deployment list --name <account> --resource-group <rg> -o table

# Check quota usage
az cognitiveservices usage list --name <account> --resource-group <rg>

# Delete failed deployment
az cognitiveservices account deployment delete --name <account> --resource-group <rg> --deployment-name <name>
```

---

## Selection Guides & Advanced Topics

> For SKU comparison tables, PTU sizing formulas, and advanced option details, load [references/customize-guides.md](references/customize-guides.md).

**SKU selection:** GlobalStandard (production/HA) → Standard (dev/test) → ProvisionedManaged (high-volume/guaranteed throughput) → DataZoneStandard (data residency).

**Capacity:** TPM-based SKUs range from 1K (dev) to 100K+ (large production). PTU-based use formula: `(Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1)`.

**Advanced options:** Dynamic quota (GlobalStandard only), priority processing (PTU only, extra cost), spillover (overflow to backup deployment).

---

## Related Skills

- **preset** - Quick deployment to best region with automatic configuration
- **microsoft-foundry** - Parent skill for all Azure AI Foundry operations
- **[quota](../../../quota/quota.md)** — For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill instead of duplicating guidance
- **rbac** - Manage permissions and access control

---

## Notes

- Set `PROJECT_RESOURCE_ID` environment variable to skip prompt
- Not all SKUs available in all regions; capacity varies by subscription/region/model
- Custom RAI policies can be configured in Azure Portal
- Automatic version upgrades occur during maintenance windows
- Use Azure Monitor and Application Insights for production deployments

---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/customize-workflow.md

```markdown
# Customize Workflow — Detailed Phase Instructions

> Reference for: `models/deploy-model/customize/SKILL.md`

## Phase 1: Verify Authentication

```bash
az account show --query "{Subscription:name, User:user.name}" -o table
```

If not logged in: `az login`

Set subscription if needed:
```bash
az account list --query "[].[name,id,state]" -o table
az account set --subscription <subscription-id>
```

---

## Phase 2: Get Project Resource ID

Check `PROJECT_RESOURCE_ID` env var. If not set, prompt user.

**Format:** `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`

---

## Phase 3: Parse and Verify Project

Parse ARM resource ID to extract components:

```powershell
$SUBSCRIPTION_ID = ($PROJECT_RESOURCE_ID -split '/')[2]
$RESOURCE_GROUP = ($PROJECT_RESOURCE_ID -split '/')[4]
$ACCOUNT_NAME = ($PROJECT_RESOURCE_ID -split '/')[8]
$PROJECT_NAME = ($PROJECT_RESOURCE_ID -split '/')[10]
```

Verify project exists and get region:
```bash
az account set --subscription $SUBSCRIPTION_ID
az cognitiveservices account show \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query location -o tsv
```

---

## Phase 4: Get Model Name

List available models if not provided:
```bash
az cognitiveservices account list-models \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[].name" -o json
```

Present sorted unique list. Allow custom model name entry.

**Detect model format:**

```bash
# Get model format (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere)
MODEL_FORMAT=$(az cognitiveservices account list-models \
  --name "$ACCOUNT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "[?name=='$MODEL_NAME'].format" -o tsv | head -1)

MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"}
echo "Model format: $MODEL_FORMAT"
```

> 💡 **Model format determines the deployment path:**
> - `OpenAI` — Standard CLI, TPM-based capacity, RAI policies, version upgrade policies
> - `Anthropic` — REST API with `modelProviderData`, capacity=1, no RAI, no version upgrade
> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) — Standard CLI, capacity=1 (MaaS), no RAI, no version upgrade

---

## Phase 5: List and Select Model Version

```bash
az cognitiveservices account list-models \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[?name=='$MODEL_NAME'].version" -o json
```

Recommend latest version (first in list). Default to `"latest"` if no versions found.

---

## Phase 6: List and Select SKU

> ⚠️ **Warning:** Never hardcode SKU lists — always query live data.

**Step A — Query model-supported SKUs:**
```bash
az cognitiveservices model list \
  --location $PROJECT_REGION \
  --subscription $SUBSCRIPTION_ID -o json
```

Filter: `model.name == $MODEL_NAME && model.version == $MODEL_VERSION`, extract `model.skus[].name`.

**Step B — Check subscription quota per SKU:**
```bash
az cognitiveservices usage list \
  --location $PROJECT_REGION \
  --subscription $SUBSCRIPTION_ID -o json
```

Quota key pattern: `OpenAI.<SKU>.<model-name>`. Calculate `available = limit - currentValue`.

**Step C — Present only deployable SKUs** (available > 0). If no SKUs have quota, direct user to the [quota skill](../../../../quota/quota.md).

---

## Phase 7: Configure Capacity

> ⚠️ **Non-OpenAI models (MaaS):** If `MODEL_FORMAT != "OpenAI"`, capacity is always `1` (pay-per-token billing). Skip capacity configuration and set `DEPLOY_CAPACITY=1`. Proceed to Phase 7c (Anthropic) or Phase 8.

**For OpenAI models only — query capacity via REST API:**
```bash
# Current region capacity
az rest --method GET --url \
  "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION"
```

Filter result for `properties.skuName == $SELECTED_SKU`. Read `properties.availableCapacity`.

**Capacity defaults by SKU (OpenAI only):**

| SKU | Unit | Min | Max | Step | Default |
|-----|------|-----|-----|------|---------|
| ProvisionedManaged | PTU | 50 | 1000 | 50 | 100 |
| Others (TPM-based) | TPM | 1000 | min(available, 300000) | 1000 | min(10000, available/2) |

Validate user input: must be >= min, <= max, multiple of step. On invalid input, explain constraints.

### Phase 7b: Cross-Region Fallback

If no capacity in current region, query ALL regions:
```bash
az rest --method GET --url \
  "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION"
```

Filter: `properties.skuName == $SELECTED_SKU && properties.availableCapacity > 0`. Sort descending by capacity.

Present available regions. After user selects region, find existing projects there:
```bash
az cognitiveservices account list \
  --query "[?kind=='AIProject' && location=='$PROJECT_REGION'].{Name:name, ResourceGroup:resourceGroup}" \
  -o json
```

If projects exist, let user select one and update `$ACCOUNT_NAME`, `$RESOURCE_GROUP`. If none, direct to project/create skill.

Re-run capacity configuration with new region's available capacity.

If no region has capacity: fail with guidance to request quota increase, check existing deployments, or try different model/SKU.

---

## Phase 7c: Anthropic Model Provider Data (Anthropic models only)

> ⚠️ **Only execute this phase if `MODEL_FORMAT == "Anthropic"`.** For OpenAI and other models, skip to Phase 8.

Anthropic models require `modelProviderData` in the deployment payload. Collect this before deployment.

**Step 1: Prompt user to select industry**

Present the following list and ask the user to choose one:

```
 1. None                    (API value: none)
 2. Biotechnology           (API value: biotechnology)
 3. Consulting              (API value: consulting)
 4. Education               (API value: education)
 5. Finance                 (API value: finance)
 6. Food & Beverage         (API value: food_and_beverage)
 7. Government              (API value: government)
 8. Healthcare              (API value: healthcare)
 9. Insurance               (API value: insurance)
10. Law                     (API value: law)
11. Manufacturing           (API value: manufacturing)
12. Media                   (API value: media)
13. Nonprofit               (API value: nonprofit)
14. Technology              (API value: technology)
15. Telecommunications      (API value: telecommunications)
16. Sport & Recreation      (API value: sport_and_recreation)
17. Real Estate             (API value: real_estate)
18. Retail                  (API value: retail)
19. Other                   (API value: other)
```

> ⚠️ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static — there is no REST API that provides it.

Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`).

**Step 2: Fetch tenant info (country code and organization name)**

```bash
TENANT_INFO=$(az rest --method GET \
  --url "https://management.azure.com/tenants?api-version=2024-11-01" \
  --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json)

COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode')
ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName')
```

*PowerShell version:*
```powershell
$tenantInfo = az rest --method GET `
  --url "https://management.azure.com/tenants?api-version=2024-11-01" `
  --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json

$countryCode = $tenantInfo.countryCode
$orgName = $tenantInfo.displayName
```

Store `COUNTRY_CODE` and `ORG_NAME` for use in Phase 13.

---

## Phase 8: Select RAI Policy (Content Filter)

> ⚠️ **Note:** RAI policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"` (Anthropic, Meta-Llama, Mistral, Cohere, etc. do not use RAI policies).

Present options:
1. `Microsoft.DefaultV2` — Balanced filtering (recommended). Filters hate, violence, sexual, self-harm.
2. `Microsoft.Prompt-Shield` — Enhanced prompt injection/jailbreak protection.
3. Custom policies — Organization-specific (configured in Azure Portal).

Default: `Microsoft.DefaultV2`.

---

## Phase 9: Configure Advanced Options

Options are SKU-dependent:

**A. Dynamic Quota** (GlobalStandard only)
- Auto-scales beyond base allocation when capacity available
- Default: enabled

**B. Priority Processing** (ProvisionedManaged only)
- Prioritizes requests during high load; additional charges apply
- Default: disabled

**C. Spillover** (any SKU)
- Redirects requests to backup deployment at capacity
- Requires existing deployment; list with:
```bash
az cognitiveservices account deployment list \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[].name" -o json
```
- Default: disabled

---

## Phase 10: Configure Version Upgrade Policy

> ⚠️ **Note:** Version upgrade policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"`.

| Policy | Description |
|--------|-------------|
| `OnceNewDefaultVersionAvailable` | Auto-upgrade to new default (Recommended) |
| `OnceCurrentVersionExpired` | Upgrade only when current expires |
| `NoAutoUpgrade` | Manual upgrade only |

Default: `OnceNewDefaultVersionAvailable`.

---

## Phase 11: Generate Deployment Name

List existing deployments to avoid conflicts:
```bash
az cognitiveservices account deployment list \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[].name" -o json
```

Auto-generate: use model name as base, append `-2`, `-3` etc. if taken. Allow custom override. Validate: `^[\w.-]{2,64}$`.

---

## Phase 12: Review Configuration

Display summary of all selections for user confirmation before proceeding:
- Model, version, deployment name
- SKU, capacity (with unit), region
- RAI policy, version upgrade policy
- Advanced options (dynamic quota, priority, spillover)
- Account, resource group, project

User confirms or cancels.

---

## Phase 13: Execute Deployment

> 💡 `MODEL_FORMAT` was already detected in Phase 4. Use the stored value here.

### Standard CLI deployment (non-Anthropic models):

**Create deployment:**
```bash
az cognitiveservices account deployment create \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --deployment-name $DEPLOYMENT_NAME \
  --model-name $MODEL_NAME \
  --model-version $MODEL_VERSION \
  --model-format "$MODEL_FORMAT" \
  --sku-name $SELECTED_SKU \
  --sku-capacity $DEPLOY_CAPACITY
```

> 💡 **Note:** For non-OpenAI MaaS models, `$DEPLOY_CAPACITY` is `1` (set in Phase 7).

### Anthropic model deployment (requires modelProviderData):

The Azure CLI does not support `--model-provider-data`. Use the ARM REST API directly.

> ⚠️ Industry, country code, and organization name should have been collected in Phase 7c.

```bash
echo "Creating Anthropic model deployment via REST API..."

az rest --method PUT \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \
  --body "{
    \"sku\": {
      \"name\": \"$SELECTED_SKU\",
      \"capacity\": 1
    },
    \"properties\": {
      \"model\": {
        \"format\": \"Anthropic\",
        \"name\": \"$MODEL_NAME\",
        \"version\": \"$MODEL_VERSION\"
      },
      \"modelProviderData\": {
        \"industry\": \"$SELECTED_INDUSTRY\",
        \"countryCode\": \"$COUNTRY_CODE\",
        \"organizationName\": \"$ORG_NAME\"
      }
    }
  }"
```

*PowerShell version:*
```powershell
Write-Host "Creating Anthropic model deployment via REST API..."

$body = @{
    sku = @{
        name = $SELECTED_SKU
        capacity = 1
    }
    properties = @{
        model = @{
            format = "Anthropic"
            name = $MODEL_NAME
            version = $MODEL_VERSION
        }
        modelProviderData = @{
            industry = $SELECTED_INDUSTRY
            countryCode = $countryCode
            organizationName = $orgName
        }
    }
} | ConvertTo-Json -Depth 5

az rest --method PUT `
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" `
  --body $body
```

> 💡 **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity. RAI policy is not applicable for Anthropic models.

### Monitor deployment status:
```bash
az cognitiveservices account deployment show \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --deployment-name $DEPLOYMENT_NAME \
  --query "properties.provisioningState" -o tsv
```

Poll until `Succeeded` or `Failed`. Timeout after 5 minutes.

**Get endpoint:**
```bash
az cognitiveservices account show \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "properties.endpoint" -o tsv
```

On success, display deployment name, model, version, SKU, capacity, region, RAI policy, rate limits, endpoint, and Azure AI Foundry portal link.

```

### references/customize-guides.md

```markdown
# Customize Guides — Selection Guides & Advanced Topics

> Reference for: `models/deploy-model/customize/SKILL.md`

**Table of Contents:** [Selection Guides](#selection-guides) · [Advanced Topics](#advanced-topics)

## Selection Guides

### How to Choose SKU

| SKU | Best For | Cost | Availability |
|-----|----------|------|--------------|
| **GlobalStandard** | Production, high availability | Medium | Multi-region |
| **Standard** | Development, testing | Low | Single region |
| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity |
| **DataZoneStandard** | Data residency requirements | Medium | Specific zones |

**Decision Tree:**
```
Do you need guaranteed throughput?
├─ Yes → ProvisionedManaged (PTU)
└─ No → Do you need high availability?
        ├─ Yes → GlobalStandard
        └─ No → Standard
```

### How to Choose Capacity

**For TPM-based SKUs (GlobalStandard, Standard):**

| Workload | Recommended Capacity |
|----------|---------------------|
| Development/Testing | 1K - 5K TPM |
| Small Production | 5K - 20K TPM |
| Medium Production | 20K - 100K TPM |
| Large Production | 100K+ TPM |

**For PTU-based SKUs (ProvisionedManaged):**

Use the PTU calculator based on:
- Input tokens per minute
- Output tokens per minute
- Requests per minute

**Capacity Planning Tips:**
- Start with recommended capacity
- Monitor usage and adjust
- Enable dynamic quota for flexibility
- Consider spillover for peak loads

### How to Choose RAI Policy

| Policy | Filtering Level | Use Case |
|--------|----------------|----------|
| **Microsoft.DefaultV2** | Balanced | Most applications |
| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps |
| **Custom** | Configurable | Specific requirements |

**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs.

---

## Advanced Topics

### PTU (Provisioned Throughput Units) Deployments

**What is PTU?**
- Reserved capacity with guaranteed throughput
- Measured in PTU units, not TPM
- Fixed cost regardless of usage
- Best for high-volume, predictable workloads

**PTU Calculator:**

```
Estimated PTU = (Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1)

Example:
- Input: 10,000 tokens/min
- Output: 5,000 tokens/min
- Requests: 100/min

PTU = (10,000 × 0.001) + (5,000 × 0.002) + (100 × 0.1)
    = 10 + 10 + 10
    = 30 PTU
```

**PTU Deployment:**
```bash
az cognitiveservices account deployment create \
  --name <account-name> \
  --resource-group <resource-group> \
  --deployment-name <deployment-name> \
  --model-name <model-name> \
  --model-version <version> \
  --model-format "OpenAI" \
  --sku-name "ProvisionedManaged" \
  --sku-capacity 100  # PTU units
```

### Spillover Configuration

**Spillover Workflow:**
1. Primary deployment receives requests
2. When capacity reached, requests overflow to spillover target
3. Spillover target must be same model or compatible
4. Configure via deployment properties

**Best Practices:**
- Use spillover for peak load handling
- Spillover target should have sufficient capacity
- Monitor both deployments
- Test failover behavior

### Priority Processing

**What is Priority Processing?**
- Prioritizes your requests during high load
- Available for ProvisionedManaged SKU
- Additional charges apply
- Ensures consistent performance

**When to Use:**
- Mission-critical applications
- SLA requirements
- High-concurrency scenarios

```

### ../../../quota/quota.md

```markdown
# Microsoft Foundry Quota Management

Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.

> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.

> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** as the primary method. MCP tools are optional convenience wrappers around the same control plane APIs.

## Quota Types

| Type | Description |
|------|-------------|
| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |
| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |
| **Region** | Max capacity per region, shared across subscription |
| **Slots** | 10-20 deployment slots per resource |

**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.

---

Use this sub-skill when the user needs to:

- **View quota usage** — check current TPM/PTU allocation and available capacity
- **Check quota limits** — show quota limits for a subscription, region, or model
- **Find optimal regions** — compare quota availability across regions for deployment
- **Plan deployments** — verify sufficient quota before deploying models
- **Request quota increases** — navigate quota increase process through Azure Portal
- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors
- **Optimize allocation** — monitor and consolidate quota across deployments
- **Monitor quota across deployments** — track capacity by model and region
- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas
- **Free up quota** — identify and delete unused deployments

**Key Points:**
1. Isolated by region (East US ≠ West US)
2. Regional capacity varies by model
3. Multi-region enables failover and load distribution
4. Quota requests specify target region

See [detailed guide](./references/workflows.md#regional-quota).

---

## Core Workflows

### 1. Check Regional Quota

```bash
subId=$(az account show --query id -o tsv)
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
```

**Output interpretation:**
- **Used**: Current TPM consumed (10000 = 10K TPM)
- **Limit**: Maximum TPM quota (15000 = 15K TPM)
- **Available**: Limit - Used (5K TPM available)

Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.

---

### 2. Find Best Region for Deployment

Check specific regions for available quota:

```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```

See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.

---

### 3. Check Quota Before Deployment

Verify available quota for your target model:

```bash
subId=$(az account show --query id -o tsv)
region="eastus"
model="OpenAI.Standard.gpt-4o"

az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```

- **Available > 0**: Yes, you have quota
- **Available = 0**: Delete unused deployments or try different region

---

### 4. Monitor Quota by Model

Show quota allocation grouped by model:

```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```

Shows aggregate usage across ALL deployments by model type.

**Optional:** List individual deployments:
```bash
az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table

az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
  --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table
```

---

### 5. Delete Deployment (Free Quota)

```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
  --deployment-name <deployment>
```

Quota freed **immediately**. Re-run Workflow #1 to verify.

---

### 6. Request Quota Increase

**Azure Portal Process:**
1. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource
2. Select **Quotas** in left navigation
3. Click **Request quota increase**
4. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification**
5. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))

**Justification template:**
```
Production [workload type] using [model] in [region].
Expected traffic: [X requests/day] with [Y tokens/request].
Requires [Z TPM] capacity. Current [N TPM] insufficient.
Request increase to [M TPM]. Deployment target: [date].
```

See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.

---

## Quick Troubleshooting

| Error | Quick Fix | Detailed Guide |
|-------|-----------|----------------|
| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |
| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |
| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |
| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |

---

## References

**Detailed Guides:**
- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits
- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands
- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs
- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations
- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks
- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning

**Official Microsoft Documentation:**
- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates
- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions
- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures
- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details

**Calculators:**
- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator
- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### EXAMPLES.md

```markdown
# customize Examples

## Example 1: Basic Deployment with Defaults

**Scenario:** Deploy gpt-4o accepting all defaults for quick setup.
**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled
**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled.

## Example 2: Production Deployment with Custom Capacity

**Scenario:** Deploy gpt-4o for production with high throughput.
**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production`
**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.

## Example 3: PTU Deployment for High-Volume Workload

**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload.
**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled
**PTU sizing:** 40K input + 20K output tokens/min → ~100 PTU estimated → 200 PTU recommended (2x headroom)
**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.

## Example 4: Development Deployment with Standard SKU

**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost.
**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev`
**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.

## Example 5: Spillover Configuration

**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow.
**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → `gpt-4o-backup`
**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.

## Example 6: Anthropic Model Deployment (claude-sonnet-4-6)

**Scenario:** Deploy claude-sonnet-4-6 with customized settings.
**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering)
**Result:** User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with `modelProviderData` in ~2 min.

---

## Comparison Matrix

| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case |
|----------|-------|-----|----------|:---:|:---:|:---:|----------|
| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | ✓ | - | - | Quick setup |
| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | ✓ | - | - | Production |
| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | ✓ | - | Predictable workload |
| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing |
| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | ✓ | - | ✓ | Peak load |
| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model |

## Common Patterns

### Dev → Staging → Production

| Stage | Model | SKU | Capacity | Extras |
|-------|-------|-----|----------|--------|
| Dev | gpt-4o-mini | Standard | 1K TPM | — |
| Staging | gpt-4o | GlobalStandard | 10K TPM | — |
| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover |

### Cost Optimization

- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing
- **Low priority:** gpt-4o-mini, Standard, 5K TPM

---

## Tips and Best Practices

**Capacity:** Start conservative → monitor with Azure Monitor → scale gradually → use spillover for peaks.

**SKU Selection:** Standard for dev → GlobalStandard + dynamic quota for variable production → ProvisionedManaged (PTU) for predictable load.

**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.

**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.

**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.

---

## Troubleshooting

| Problem | Solution |
|---------|----------|
| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase |
| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest |
| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name |

```