resource-tagging
Apply and enforce cloud resource tagging strategies across AWS, Azure, GCP, and Kubernetes for cost allocation, ownership tracking, compliance, and automation. Use when implementing cloud governance, optimizing costs, or automating infrastructure management.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install ancoleman-ai-design-components-resource-tagging
Repository
Skill path: skills/resource-tagging
Apply and enforce cloud resource tagging strategies across AWS, Azure, GCP, and Kubernetes for cost allocation, ownership tracking, compliance, and automation. Use when implementing cloud governance, optimizing costs, or automating infrastructure management.
Open repositoryBest for
Primary workflow: Run DevOps.
Technical facets: Full Stack, DevOps.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: ancoleman.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install resource-tagging into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/ancoleman/ai-design-components before adding resource-tagging to shared team environments
- Use resource-tagging for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
--- name: resource-tagging description: Apply and enforce cloud resource tagging strategies across AWS, Azure, GCP, and Kubernetes for cost allocation, ownership tracking, compliance, and automation. Use when implementing cloud governance, optimizing costs, or automating infrastructure management. --- # Resource Tagging Apply comprehensive cloud resource tagging strategies to enable cost allocation, ownership tracking, compliance enforcement, and infrastructure automation across multi-cloud environments. ## Purpose Resource tagging provides the foundational metadata layer for cloud governance. Tags enable precise cost allocation (reducing unallocated spend by up to 80%), rapid ownership identification, compliance scope definition, and automated lifecycle management. Without proper tagging, cloud costs become untrackable, security incidents lack context, and automation policies fail to target resources effectively. ## When to Use Use resource tagging when: - Implementing cloud governance frameworks for cost allocation and accountability - Building FinOps practices requiring spend visibility by team, project, or department - Enforcing compliance requirements (PCI, HIPAA, SOC2) through automated policies - Setting up automated resource lifecycle management (backup, monitoring, shutdown) - Managing multi-tenant or multi-project cloud environments - Implementing disaster recovery and backup policies based on criticality - Tracking resource ownership for security incident response - Optimizing cloud costs through spend analysis and showback/chargeback ## Minimum Viable Tagging Strategy Start with the **"Big Six"** required tags for all cloud resources: | Tag | Purpose | Example Value | |-----|---------|---------------| | **Name** | Human-readable identifier | `prod-api-server-01` | | **Environment** | Lifecycle stage | `prod` \| `staging` \| `dev` | | **Owner** | Responsible team contact | `[email protected]` | | **CostCenter** | Finance code for billing | `CC-1234` | | **Project** | Business initiative | `ecommerce-platform` | | **ManagedBy** | Resource creation method | `terraform` \| `pulumi` \| `manual` | **Optional tags** to add based on specific needs: - **Application**: Multi-app projects requiring app-level isolation - **Component**: Resource role (`web`, `api`, `database`, `cache`) - **Backup**: Backup policy (`daily`, `weekly`, `none`) - **Compliance**: Regulatory scope (`PCI`, `HIPAA`, `SOC2`) - **SLA**: Service level (`critical`, `high`, `medium`, `low`) ## Tag Naming Conventions Choose ONE naming convention organization-wide and enforce consistently: | Convention | Format | Example | Best For | |------------|--------|---------|----------| | **PascalCase** | `CostCenter`, `ProjectName` | AWS standard | AWS-first orgs | | **lowercase** | `costcenter`, `project` | GCP labels (required) | GCP-first orgs | | **kebab-case** | `cost-center`, `project-name` | Azure (case-insensitive) | Azure-first orgs | | **Namespaced** | `company:environment`, `team:owner` | Multi-org tag policies | Large enterprises | **Critical:** Case sensitivity varies by provider: - **AWS**: Case-sensitive (`Environment` ≠ `environment`) - **Azure**: Case-insensitive (`Environment` = `environment`) - **GCP**: Lowercase required (`environment` only) - **Kubernetes**: Case-sensitive (`environment` ≠ `Environment`) ## Tag Categories For detailed taxonomy of all tag categories, see `references/tag-taxonomy.md`. ### Technical Tags Operations-focused metadata: Name, Environment, Version, ManagedBy ### Business Tags Cost allocation metadata: Owner, CostCenter, Project, Department ### Security Tags Compliance metadata: Confidentiality, Compliance, DataClassification, SecurityZone ### Automation Tags Lifecycle metadata: Backup, Monitoring, Schedule, AutoShutdown ### Operational Tags Support metadata: SLA, ChangeManagement, CreatedBy, CreatedDate ### Custom Tags Organization-specific metadata: Customer, Application, Component, Stack ## Cloud Provider Tag Limits | Provider | Tag Limit | Key Length | Value Length | Case Sensitive | Inheritance | |----------|-----------|------------|--------------|----------------|-------------| | **AWS** | 50 user-defined | 128 chars | 256 chars | Yes | Via tag policies | | **Azure** | 50 pairs | 512 chars | 256 chars | No | Via Azure Policy | | **GCP** | 64 labels | 63 chars | 63 chars | No | Via org policies | | **Kubernetes** | Unlimited | 253 prefix + 63 name | 63 chars | Yes | Via namespace | ## Tag Enforcement Patterns ### Infrastructure as Code (Recommended) Apply tags automatically via Terraform/Pulumi to reduce manual errors by 95%: ```hcl # Terraform: Provider-level default tags provider "aws" { default_tags { tags = { Environment = var.environment Owner = var.owner CostCenter = var.cost_center Project = var.project ManagedBy = "terraform" } } } ``` All resources automatically inherit these tags. Resource-specific tags merge with defaults. For complete Terraform, Pulumi, and CloudFormation examples, see `examples/terraform/`, `examples/pulumi/`, and `examples/cloudformation/`. ### Policy-Based Enforcement Enforce tagging at resource creation time: **AWS**: Use AWS Config rules to check tag compliance (alert or deny) **Azure**: Use Azure Policy for tag inheritance and enforcement **GCP**: Use Organization Policies to restrict label values **Kubernetes**: Use OPA Gatekeeper or Kyverno for admission control For enforcement implementation patterns, see `references/enforcement-patterns.md`. ### Tag Compliance Auditing Run regular audits (weekly recommended) to identify untagged resources: **AWS Config Query** (SQL): ```sql SELECT resourceId, resourceType, configuration.tags WHERE resourceType IN ('AWS::EC2::Instance', 'AWS::RDS::DBInstance') AND (configuration.tags IS NULL OR NOT configuration.tags.Environment EXISTS) ``` **Azure Resource Graph Query** (KQL): ```kusto Resources | where type in~ ('microsoft.compute/virtualmachines') | where isnull(tags.Environment) or isnull(tags.Owner) | project name, type, resourceGroup, tags ``` **GCP Cloud Asset Inventory**: ```bash gcloud asset search-all-resources \ --query="NOT labels:environment OR NOT labels:owner" \ --format="table(name,assetType,labels)" ``` For complete audit queries and scripts, see `references/compliance-auditing.md` and `scripts/audit_tags.py`. ## Cost Allocation with Tags Enable cost allocation tags to track spending by team, project, or department: ### AWS Cost Explorer Activate cost allocation tags (up to 24 hours for activation): ```hcl # Enable cost allocation tags via Terraform resource "aws_ce_cost_allocation_tag" "environment" { tag_key = "Environment" status = "Active" } resource "aws_ce_cost_allocation_tag" "project" { tag_key = "Project" status = "Active" } ``` Set up cost anomaly detection by tag to catch unusual spending: ```hcl resource "aws_ce_anomaly_monitor" "project_monitor" { name = "project-cost-monitor" monitor_type = "DIMENSIONAL" monitor_specification = jsonencode({ Tags = { Key = "Project" Values = ["ecommerce", "mobile-app"] } }) } ``` ### Azure Cost Management Group costs by tags in Azure Cost Management dashboards. Export cost data with tag breakdowns: ```bash az consumption usage list \ --start-date 2025-12-01 \ --query "[].{Cost:pretaxCost, Project:tags.Project, Team:tags.Owner}" ``` ### GCP Cloud Billing Export billing data to BigQuery with label breakdowns: ```sql SELECT labels.key AS label_key, labels.value AS label_value, SUM(cost) AS total_cost FROM `project.dataset.gcp_billing_export_v1_XXXXX` CROSS JOIN UNNEST(labels) AS labels WHERE labels.key IN ('environment', 'project', 'costcenter') GROUP BY label_key, label_value ORDER BY total_cost DESC ``` For cost allocation implementation details, see `references/cost-allocation.md`. ## Decision Framework: Required vs. Optional Tags Determine which tags to enforce at creation time: **REQUIRED (enforce with hard deny)**: - Cost allocation: Owner, CostCenter, Project - Lifecycle: Environment, ManagedBy - Identification: Name **RECOMMENDED (soft enforcement - alert only)**: - Operational: Backup, Monitoring, Schedule - Security: Compliance, DataClassification - Support: SLA, ChangeManagement **OPTIONAL (no enforcement)**: - Custom: Application, Component, Customer - Experimental: Any non-standard tags **Enforcement methods**: 1. **Hard enforcement** (deny resource creation): Use for cost allocation tags - AWS: AWS Config rules with deny mode - Azure: Azure Policy with deny effect - GCP: Organization policies with constraints 2. **Soft enforcement** (alert only): Use for operational tags - AWS: AWS Config rules with notification - Azure: Azure Policy with audit effect - GCP: Cloud Asset Inventory reports 3. **No enforcement** (best-effort): Use for custom/experimental tags ## Tag Inheritance Strategies Reduce manual tagging effort through automatic inheritance: ### AWS Tag Policies Inherit tags from AWS Organizations account hierarchy: ```json { "tags": { "Environment": { "tag_key": { "@@assign": "Environment" }, "enforced_for": { "@@assign": ["ec2:instance", "s3:bucket"] } } } } ``` ### Azure Tag Inheritance Use Azure Policy to inherit tags from resource groups: ```hcl resource "azurerm_policy_assignment" "inherit_environment" { name = "inherit-environment-tag" policy_definition_id = azurerm_policy_definition.inherit_tags.id parameters = jsonencode({ tagName = { value = "Environment" } }) } ``` ### GCP Label Inheritance Inherit labels from folders/projects via organization policies: ```hcl resource "google_organization_policy" "require_labels" { org_id = var.organization_id constraint = "constraints/gcp.resourceLabels" list_policy { allow { values = ["environment:prod", "environment:staging"] } inherit_from_parent = true } } ``` ### Kubernetes Label Propagation Use Kyverno to auto-generate labels from namespaces: ```yaml apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: add-default-labels spec: rules: - name: add-environment-label match: resources: kinds: [Pod, Deployment] mutate: patchStrategicMerge: metadata: labels: +(environment): "{{request.namespace}}" ``` ## Common Anti-Patterns ### Anti-Pattern 1: Inconsistent Tag Naming **Problem**: Multiple variations of the same tag across resources ```yaml # BAD: Tag sprawl Environment: prod environment: production Env: prod ENVIRONMENT: PROD ``` **Solution**: Enforce single naming convention via IaC and tag policies ```yaml # GOOD: Consistent naming Environment: prod # Single standard format ``` ### Anti-Pattern 2: Manual Resource Creation Without Tags **Problem**: CLI/console-created resources missing required tags **Solution**: Block untagged resource creation via Config/Policy rules, or use AWS Service Catalog/Azure Blueprints with pre-tagged templates ### Anti-Pattern 3: No Tag Enforcement (Voluntary Tagging) **Problem**: Tags are optional, frequently forgotten, leading to 35% unallocated spend **Solution**: Use provider default tags in IaC + policy enforcement at account/subscription level ### Anti-Pattern 4: Tag Sprawl (Too Many Custom Tags) **Problem**: 30+ tags per resource, most unused, causing noise in cost reports **Solution**: Start with "Big Six" required tags only. Add optional tags only when clear use case exists. ### Anti-Pattern 5: Static Tags Not Updated **Problem**: Tags set at creation but never updated (e.g., `Owner` outdated after team changes) **Solution**: Run automated tag audits (weekly), use IaC to update tags programmatically, integrate with identity provider for owner updates ## Integration with Other Skills **infrastructure-as-code**: Tags applied automatically via Terraform/Pulumi modules with default_tags/stackTags **cost-optimization**: Tags enable cost allocation, showback/chargeback, and budget alerts by project/team **compliance-frameworks**: Tags prove PCI/HIPAA/SOC2 scope for audit trails and automated policy enforcement **security-hardening**: Tags enforce security policies (e.g., public vs. internal access based on SecurityZone tag) **disaster-recovery**: Tags identify resources for backup policies (e.g., `Backup: daily` triggers automated snapshots) **kubernetes-operations**: Labels used for pod scheduling, resource quotas, network policies, and service selection ## Implementation Checklist When implementing resource tagging: - [ ] Define "Big Six" required tags with allowed values - [ ] Choose ONE naming convention (PascalCase, lowercase, kebab-case) - [ ] Implement tags in IaC (Terraform/Pulumi provider default_tags) - [ ] Set up enforcement policies (AWS Config, Azure Policy, GCP org policies) - [ ] Enable cost allocation tags in billing console (AWS Cost Explorer, Azure Cost Management) - [ ] Create tag compliance audit process (weekly recommended) - [ ] Document tag standards in organization wiki/runbook - [ ] Set up automated alerts for untagged resources - [ ] Integrate tags with monitoring/alerting for owner contact - [ ] Create remediation playbook for non-compliant resources ## Quick Reference ### Tag Enforcement Tools by Provider | Provider | Enforcement Tool | Purpose | |----------|------------------|---------| | **AWS** | AWS Config Rules | Tag compliance monitoring + remediation | | **AWS** | Tag Policies (Organizations) | Enforce tags at account level | | **Azure** | Azure Policy | Tag enforcement + inheritance | | **GCP** | Organization Policies | Label restrictions + inheritance | | **Kubernetes** | OPA Gatekeeper | Admission control for labels | | **Kubernetes** | Kyverno | Auto-generate labels + validation | ### Cost Allocation Tools | Tool | Purpose | |------|---------| | AWS Cost Explorer | Tag-based cost analysis + anomaly detection | | Azure Cost Management | Tag grouping + budgets | | GCP Cloud Billing | Label-based cost breakdown | | CloudHealth | Multi-cloud cost optimization | | Kubecost | Kubernetes cost allocation by labels | ### Validation Tools (Pre-Deployment) | Tool | Purpose | |------|---------| | Checkov | IaC tag validation (pre-commit) | | tflint | Terraform linting for tag rules | | terraform-compliance | BDD tests for tag policies | ## Additional Resources For detailed implementation guidance: - **Tag taxonomy and categories**: See `references/tag-taxonomy.md` - **Enforcement patterns (AWS, Azure, GCP, K8s)**: See `references/enforcement-patterns.md` - **Cost allocation setup**: See `references/cost-allocation.md` - **Compliance auditing queries**: See `references/compliance-auditing.md` - **Terraform examples**: See `examples/terraform/` - **Kubernetes manifests**: See `examples/kubernetes/` - **Audit scripts**: See `scripts/audit_tags.py`, `scripts/cost_by_tag.py` ## Key Takeaways 1. **Start with "Big Six" required tags**: Name, Environment, Owner, CostCenter, Project, ManagedBy 2. **Enforce at creation time**: Use AWS Config, Azure Policy, GCP org policies to block untagged resources 3. **Automate with IaC**: Terraform/Pulumi default tags reduce manual errors by 95% 4. **Enable cost allocation**: Activate billing tags to reduce unallocated spend by 80% 5. **Choose ONE naming convention**: PascalCase, lowercase, or kebab-case - enforce consistently 6. **Inherit tags from parents**: Resource groups, folders, namespaces propagate tags automatically 7. **Audit regularly**: Weekly tag compliance checks catch drift and prevent sprawl 8. **Tag inheritance reduces effort**: Let parent resources propagate common tags to children --- ## Referenced Files > The following files are referenced in this skill and included for context. ### references/tag-taxonomy.md ```markdown # Resource Tagging Taxonomy Complete taxonomy of cloud resource tags organized by category, purpose, and use case. ## Table of Contents 1. [Tag Categories Overview](#tag-categories-overview) 2. [Technical Tags](#technical-tags) 3. [Business Tags](#business-tags) 4. [Security Tags](#security-tags) 5. [Automation Tags](#automation-tags) 6. [Operational Tags](#operational-tags) 7. [Custom Tags](#custom-tags) 8. [Tag Naming Patterns](#tag-naming-patterns) 9. [Tag Value Standards](#tag-value-standards) --- ## Tag Categories Overview Six core tag categories provide complete cloud governance coverage: ``` ┌─────────────────────────────────────────────────────────┐ │ Six Core Tag Categories │ ├─────────────────────────────────────────────────────────┤ │ │ │ 1. TECHNICAL TAGS → Operations & Lifecycle │ │ 2. BUSINESS TAGS → Cost Allocation & Ownership │ │ 3. SECURITY TAGS → Compliance & Access Control │ │ 4. AUTOMATION TAGS → Infrastructure Management │ │ 5. OPERATIONAL TAGS → Support & Change Management │ │ 6. CUSTOM TAGS → Organization-Specific Metadata │ │ │ └─────────────────────────────────────────────────────────┘ ``` **Priority recommendation**: Start with Technical + Business tags (categories 1-2), add Security + Automation (3-4) for regulated industries, use Operational + Custom (5-6) for advanced governance. --- ## Technical Tags Operations-focused metadata for resource identification and lifecycle management. ### Name **Purpose**: Human-readable resource identifier **Required**: Yes (all resources) **Format**: `{env}-{app}-{component}-{number}` **Example**: `prod-api-server-01` **Naming patterns**: - Short, descriptive, unique within environment - Include environment prefix for clarity across accounts - Sequential numbering for multiple identical resources - Kebab-case recommended for readability **AWS**: Name tag displayed in console as primary identifier **Azure**: Name property separate from tags **GCP**: Name property separate from labels **Kubernetes**: metadata.name (separate from labels) ### Environment **Purpose**: Deployment lifecycle stage **Required**: Yes (all resources) **Values**: `prod` | `staging` | `dev` | `test` | `qa` | `demo` **Example**: `prod` **Use cases**: - Cost allocation by environment - Automated shutdown policies (dev/test resources overnight) - Security policies (prod requires stricter access controls) - Backup policies (prod: daily, dev: weekly or none) **Recommended values**: ```yaml prod: Production workloads (customer-facing) staging: Pre-production validation environment dev: Development/experimental environment test: Automated testing environment qa: Quality assurance testing demo: Sales/demo environment ``` ### Version **Purpose**: Application or infrastructure version **Required**: No (optional) **Format**: Semantic versioning `vX.Y.Z` or git commit SHA **Example**: `v1.2.3` or `abc123de` **Use cases**: - Track deployed application version - Correlate resource configuration with code version - Rollback identification - Change tracking ### ManagedBy **Purpose**: Resource creation and management method **Required**: Yes (prevents accidental manual changes to IaC resources) **Values**: `terraform` | `pulumi` | `cloudformation` | `ansible` | `manual` **Example**: `terraform` **Use cases**: - Prevent manual changes to IaC-managed resources - Identify drift from IaC state - Audit which resources are managed vs. ad-hoc - Cleanup automation (identify orphaned manual resources) **AWS Config rule**: Alert when IaC-managed resource is modified outside IaC **Azure Policy**: Audit resources created manually vs. via ARM/Bicep **GCP Organization Policy**: Tag resources with creation method --- ## Business Tags Cost allocation and ownership metadata for financial operations (FinOps). ### Owner **Purpose**: Responsible team or individual contact **Required**: Yes (all resources) **Format**: Team email address **Example**: `[email protected]` **Use cases**: - Security incident contact (who to notify for this resource) - Cost allocation responsibility - Change approval routing - Resource lifecycle decisions (can this be deleted?) **Best practices**: - Use team/group email (not individual email - avoids orphaned resources when employees leave) - Format: `{team-name}@company.com` - Integrate with identity provider for validation - Distribution list preferred over individual accounts ### CostCenter **Purpose**: Finance department code for billing allocation **Required**: Yes (cost allocation) **Format**: Organization-specific finance code **Example**: `CC-1234` or `ENG-001` **Use cases**: - Showback/chargeback to business units - Budget tracking by department - Cost anomaly detection by cost center - Financial reporting and forecasting **Integration**: - AWS Cost Explorer: Group costs by CostCenter tag - Azure Cost Management: Allocate costs to billing accounts - GCP Cloud Billing: Export with cost center labels - ERP systems: Match cloud costs to finance codes **Validation**: Regex pattern `^[A-Z]{2,4}-[0-9]{3,6}$` (adjust to org standard) ### Project **Purpose**: Business initiative or product name **Required**: Yes (cost allocation + organization) **Format**: Kebab-case project name **Example**: `ecommerce-platform` or `mobile-app` **Use cases**: - Cost tracking by project/product - Resource discovery (find all resources for project X) - Access control (grant project team access to project resources) - Lifecycle management (sunset all resources when project ends) **Best practices**: - Single project name across all resources for that initiative - Lowercase, hyphens only (avoid spaces, special chars) - Match project name in project management tools (Jira, Azure DevOps) - Archive project tag when project sunset ### Department **Purpose**: Organizational department **Required**: No (optional) **Values**: `engineering` | `sales` | `marketing` | `finance` | `operations` **Example**: `engineering` **Use cases**: - High-level cost allocation (department-level budgets) - Organizational reporting - Compliance scope (which departments handle PCI data?) **When to use**: Large organizations (500+ employees) with department-level budgeting --- ## Security Tags Compliance and access control metadata for security policies. ### Confidentiality **Purpose**: Data sensitivity classification **Required**: Recommended (security-sensitive orgs) **Values**: `public` | `internal` | `confidential` | `restricted` **Example**: `confidential` **Use cases**: - Access control policies (restrict access to confidential resources) - Encryption requirements (confidential = encrypt at rest + in transit) - Audit logging (confidential resources require detailed logs) - Data residency (restricted data cannot leave specific regions) **Mapping**: ```yaml public: Publicly accessible data (website content, marketing) internal: Internal company data (not publicly shared) confidential: Sensitive business data (financial, strategic) restricted: Highly sensitive data (PII, PHI, payment data) ``` ### Compliance **Purpose**: Regulatory compliance requirements **Required**: Yes (for regulated industries) **Values**: `PCI` | `HIPAA` | `SOC2` | `GDPR` | `FedRAMP` | `none` **Example**: `PCI` or `HIPAA,SOC2` (comma-separated if multiple) **Use cases**: - Compliance scope definition (which resources are in-scope for audit) - Automated policy enforcement (PCI resources require specific security controls) - Audit trail generation (compliance resources require enhanced logging) - Cost tracking (compliance adds overhead cost to resources) **Integration**: - AWS Security Hub: Filter compliance findings by tag - Azure Policy: Enforce controls on compliance-tagged resources - GCP Security Command Center: Scope assessments by compliance label ### DataClassification **Purpose**: Data tier classification for retention/backup **Required**: No (optional) **Values**: `tier1` | `tier2` | `tier3` | `tier4` **Example**: `tier1` **Tier definitions**: ```yaml tier1: Critical data (cannot be recreated, RPO < 1 hour) tier2: Important data (difficult to recreate, RPO < 24 hours) tier3: Standard data (can be recreated, RPO < 7 days) tier4: Ephemeral data (easily recreated, no backup required) ``` **Use cases**: - Backup frequency (tier1: continuous, tier2: hourly, tier3: daily, tier4: none) - Retention policies (tier1: 7 years, tier2: 3 years, tier3: 1 year, tier4: 30 days) - Disaster recovery priority (tier1 restored first) ### SecurityZone **Purpose**: Network security zone classification **Required**: No (optional) **Values**: `dmz` | `internal` | `restricted` | `management` **Example**: `internal` **Use cases**: - Network segmentation (dmz resources in public subnet, restricted in private) - Firewall rules (security zone determines allowed traffic) - Access control (management zone requires VPN/bastion) --- ## Automation Tags Lifecycle and infrastructure management metadata for automation policies. ### Backup **Purpose**: Backup policy assignment **Required**: Recommended (data resources) **Values**: `continuous` | `hourly` | `daily` | `weekly` | `monthly` | `none` **Example**: `daily` **Use cases**: - AWS Backup: Select resources by Backup tag - Azure Backup: Assign backup policies by tag - GCP Cloud Backup: Schedule backups by label - Snapshot automation: Trigger automated snapshots **Cost impact**: Daily backups cost more than weekly (storage + API calls) **Integration**: ```hcl # AWS Backup plan targeting Backup:daily tag resource "aws_backup_selection" "daily_backups" { plan_id = aws_backup_plan.daily.id resources = ["*"] condition { string_equals = { key = "Backup" value = "daily" } } } ``` ### Monitoring **Purpose**: Monitoring/observability enablement **Required**: No (optional) **Values**: `enabled` | `disabled` | `custom` **Example**: `enabled` **Use cases**: - CloudWatch/Application Insights agent installation - Metric collection enablement - Alerting policy assignment - Cost optimization (disable monitoring for non-critical resources) ### Schedule **Purpose**: Resource uptime schedule **Required**: No (cost optimization use case) **Values**: `always-on` | `business-hours` | `weekdays` | `on-demand` **Example**: `business-hours` **Use cases**: - Automated shutdown of dev/test resources (save 50-70% on compute) - EC2/VM scheduler (stop overnight, start morning) - Database instance scaling (reduce size during off-hours) **Schedule definitions**: ```yaml always-on: 24/7 uptime (production resources) business-hours: 8am-6pm weekdays (dev/test environments) weekdays: Monday-Friday (weekend shutdown) on-demand: Manual start/stop only (cost-sensitive workloads) ``` **Cost savings**: `business-hours` = ~65% reduction vs. `always-on` ### AutoShutdown **Purpose**: Automated shutdown enablement **Required**: No (cost optimization) **Values**: `enabled` | `disabled` **Example**: `enabled` **Use cases**: - Dev/test environment cost reduction - Lambda/CloudFunction automatic stopping - Idle resource detection and termination --- ## Operational Tags Support and change management metadata for operational processes. ### SLA **Purpose**: Service level agreement tier **Required**: No (operational maturity) **Values**: `critical` | `high` | `medium` | `low` **Example**: `critical` **Use cases**: - Incident response prioritization (critical = page on-call) - Monitoring threshold configuration (critical = tighter thresholds) - Backup/DR requirements (critical = highest RPO/RTO) - Support escalation routing **SLA definitions**: ```yaml critical: Customer-facing production (RPO < 1h, RTO < 15min) high: Internal production services (RPO < 4h, RTO < 1h) medium: Non-critical production (RPO < 24h, RTO < 4h) low: Development/testing (no SLA guarantee) ``` ### ChangeManagement **Purpose**: Change ticket reference **Required**: No (change control orgs) **Format**: `{TICKET_SYSTEM}-{NUMBER}` **Example**: `CHG-12345` or `JIRA-567` **Use cases**: - Audit trail (which change authorized this resource?) - Rollback reference (revert to pre-change state) - Compliance requirement (ITIL change management) ### CreatedBy **Purpose**: User who created resource **Required**: No (audit trail) **Format**: Email address or username **Example**: `[email protected]` **Use cases**: - Audit trail (who created this resource?) - Ownership transfer (contact creator when owner email bounces) - Security investigation (trace unauthorized resource creation) **Auto-population**: - Terraform: `data.aws_caller_identity.current.user_id` - Azure: `user().principalName` - Kubernetes: `{{request.userInfo.username}}` (Kyverno auto-inject) ### CreatedDate **Purpose**: Resource creation timestamp **Required**: No (lifecycle tracking) **Format**: ISO 8601 timestamp **Example**: `2025-12-04T10:30:00Z` **Use cases**: - Resource age calculation (identify old resources) - Lifecycle policies (delete resources older than X days) - Audit trail (when was this resource created?) **Auto-population**: - Terraform: `timestamp()` function - CloudFormation: `!Ref AWS::StackCreationTime` - Pulumi: `new Date().toISOString()` --- ## Custom Tags Organization-specific metadata for unique business requirements. ### Customer **Purpose**: Multi-tenant customer identifier **Required**: If multi-tenant SaaS **Format**: Customer ID or slug **Example**: `customer-acme-corp` **Use cases**: - Cost allocation by customer (SaaS showback) - Resource isolation (tenant-per-VPC architecture) - Data residency (customer X requires EU-only resources) - Access control (customer admins access only their resources) ### Application **Purpose**: Application name (for multi-app projects) **Required**: No (if single app per project) **Format**: Kebab-case app name **Example**: `payment-api` or `user-service` **Use cases**: - Service discovery (find all resources for app X) - Cost allocation by application - Dependency mapping (app A depends on app B) - Rollback isolation (rollback app X without affecting app Y) ### Component **Purpose**: Resource role within application **Required**: No (architectural clarity) **Values**: `web` | `api` | `database` | `cache` | `queue` | `worker` **Example**: `api` **Use cases**: - Architecture diagrams (auto-generate from tags) - Cost allocation by tier (web tier costs vs. database tier) - Scaling policies (web tier scales differently than worker tier) - Monitoring dashboards (group by component) ### Stack **Purpose**: Full-stack identifier **Required**: No (microservices architecture) **Format**: `{app}-{env}-{region}` **Example**: `ecommerce-prod-us-east-1` **Use cases**: - Multi-region deployments (identify region-specific resources) - Stack-level cost tracking - Disaster recovery (restore entire stack) --- ## Tag Naming Patterns ### PascalCase (AWS Standard) **Format**: CapitalizeEachWord **Example**: `CostCenter`, `ProjectName`, `DataClassification` **Best for**: AWS-first organizations **Pros**: AWS console default, most AWS documentation uses PascalCase **Cons**: Case-sensitive (typos create duplicate tags: `Environment` ≠ `environment`) ### lowercase (GCP Required) **Format**: alllowercase **Example**: `costcenter`, `projectname`, `dataclassification` **Best for**: GCP-first organizations **Pros**: GCP labels require lowercase (no choice) **Cons**: Less readable for multi-word tags ### kebab-case (Azure Standard) **Format**: lowercase-with-hyphens **Example**: `cost-center`, `project-name`, `data-classification` **Best for**: Azure-first organizations or multi-cloud consistency **Pros**: Case-insensitive, highly readable **Cons**: Hyphens can conflict with some automation tools ### Namespaced (Enterprise Standard) **Format**: `namespace:key` **Example**: `company:environment`, `team:owner`, `finance:costcenter` **Best for**: Large enterprises with multiple organizations/teams **Pros**: Prevents tag collision between teams, clear ownership **Cons**: Longer tag keys (counts toward character limits) **AWS Tag Policies**: Enforce namespaced tags for organizational consistency --- ## Tag Value Standards ### Allowed Characters | Provider | Key Allowed | Value Allowed | Case Sensitive | |----------|-------------|---------------|----------------| | **AWS** | `a-z`, `A-Z`, `0-9`, `+`, `-`, `=`, `.`, `_`, `:`, `/`, `@` | Same | Yes | | **Azure** | `a-z`, `A-Z`, `0-9`, `-`, `_`, `.` | Same | No | | **GCP** | `a-z`, `0-9`, `-`, `_` | Same | No | | **Kubernetes** | `a-z`, `A-Z`, `0-9`, `-`, `_`, `.` | Same | Yes | **Recommendation**: Use only `a-z`, `0-9`, `-` for maximum portability across providers ### Enumerated Values (Restrict to Allowed List) Enforce allowed values via policies to prevent typos and sprawl: **AWS Tag Policy**: ```json { "tags": { "Environment": { "tag_value": { "@@assign": ["prod", "staging", "dev", "test"] } } } } ``` **Azure Policy** (custom constraint): ```json { "policyRule": { "if": { "not": { "field": "tags['Environment']", "in": ["prod", "staging", "dev", "test"] } }, "then": { "effect": "deny" } } } ``` **GCP Organization Policy**: ```yaml constraint: constraints/gcp.resourceLabels listPolicy: allowedValues: - environment:prod - environment:staging - environment:dev - environment:test ``` ### Tag Value Patterns (Regex Validation) Validate tag value formats via policies: **Example: Owner must be email**: ```yaml # Azure Policy regex "pattern": "^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$" # GCP custom constraint condition: "resource.labels.owner.matches('^[a-z0-9._%+-]+@company\\\\.com$')" ``` **Example: CostCenter must be CC-#### format**: ```yaml # AWS Config rule parameter "pattern": "^CC-[0-9]{4}$" ``` --- ## Best Practices Summary 1. **Start minimal**: "Big Six" required tags (Name, Environment, Owner, CostCenter, Project, ManagedBy) 2. **Choose ONE naming convention**: PascalCase, lowercase, or kebab-case - enforce organization-wide 3. **Restrict tag values**: Use enums/regex to prevent typos (prod vs. production vs. PROD) 4. **Auto-populate metadata**: Use IaC to auto-set CreatedBy, CreatedDate, ManagedBy 5. **Tag inheritance**: Let parent resources (resource groups, folders) propagate common tags 6. **Validate at creation**: Use policies to block untagged or incorrectly tagged resources 7. **Audit regularly**: Weekly tag compliance checks identify drift and sprawl 8. **Document standards**: Maintain tag dictionary with purpose, values, and examples ``` ### references/enforcement-patterns.md ```markdown # Tag Enforcement Patterns Complete guide to enforcing cloud resource tagging across AWS, Azure, GCP, and Kubernetes using native policy engines. ## Table of Contents 1. [Enforcement Strategy Overview](#enforcement-strategy-overview) 2. [AWS Tag Enforcement](#aws-tag-enforcement) 3. [Azure Tag Enforcement](#azure-tag-enforcement) 4. [GCP Label Enforcement](#gcp-label-enforcement) 5. [Kubernetes Label Enforcement](#kubernetes-label-enforcement) 6. [Multi-Cloud Enforcement](#multi-cloud-enforcement) 7. [Pre-Deployment Validation](#pre-deployment-validation) --- ## Enforcement Strategy Overview Three enforcement levels determine how strictly tagging policies are applied: ``` ┌────────────────────────────────────────────────────────┐ │ Tag Enforcement Hierarchy │ ├────────────────────────────────────────────────────────┤ │ │ │ 1. HARD ENFORCEMENT (Deny Creation) │ │ ├── Cost allocation tags (Owner, CostCenter) │ │ ├── Lifecycle tags (Environment, ManagedBy) │ │ └── Identification (Name) │ │ → Block resource creation if tags missing │ │ │ │ 2. SOFT ENFORCEMENT (Alert Only) │ │ ├── Operational tags (Backup, Monitoring) │ │ ├── Security tags (Compliance, DataClassification) │ │ └── Support tags (SLA, ChangeManagement) │ │ → Allow creation, send notification to owner │ │ │ │ 3. NO ENFORCEMENT (Best Effort) │ │ ├── Custom tags (Application, Component) │ │ └── Experimental tags │ │ → No validation or alerts │ │ │ └────────────────────────────────────────────────────────┘ ``` **Recommendation**: Start with soft enforcement (alerts only), transition to hard enforcement after 30-90 days of compliance tracking. --- ## AWS Tag Enforcement AWS provides three enforcement mechanisms: AWS Config Rules, Tag Policies, and Service Control Policies (SCPs). ### Pattern 1: AWS Config Rules (Reactive Enforcement) Check tag compliance after resource creation and trigger remediation. **Use case**: Alert when required tags are missing, optionally auto-remediate #### Terraform Implementation ```hcl # Enable AWS Config (prerequisite) resource "aws_config_configuration_recorder" "main" { name = "tag-compliance-recorder" role_arn = aws_iam_role.config_role.arn recording_group { all_supported = true include_global_resource_types = true } } resource "aws_config_delivery_channel" "main" { name = "tag-compliance-channel" s3_bucket_name = aws_s3_bucket.config_logs.id depends_on = [aws_config_configuration_recorder.main] } resource "aws_config_configuration_recorder_status" "main" { name = aws_config_configuration_recorder.main.name is_enabled = true depends_on = [aws_config_delivery_channel.main] } # Required tags Config rule resource "aws_config_config_rule" "required_tags" { name = "required-tags-check" source { owner = "AWS" source_identifier = "REQUIRED_TAGS" } input_parameters = jsonencode({ tag1Key = "Environment" tag2Key = "Owner" tag3Key = "CostCenter" tag4Key = "Project" tag5Key = "ManagedBy" tag6Key = "Name" }) scope { compliance_resource_types = [ "AWS::EC2::Instance", "AWS::RDS::DBInstance", "AWS::S3::Bucket", "AWS::Lambda::Function", "AWS::DynamoDB::Table", "AWS::ECS::Service", ] } depends_on = [aws_config_configuration_recorder.main] } # SNS topic for compliance alerts resource "aws_sns_topic" "compliance_alerts" { name = "tag-compliance-alerts" } resource "aws_sns_topic_subscription" "compliance_email" { topic_arn = aws_sns_topic.compliance_alerts.arn protocol = "email" endpoint = "[email protected]" } # EventBridge rule to notify on non-compliance resource "aws_cloudwatch_event_rule" "config_non_compliant" { name = "tag-compliance-violations" description = "Trigger when resources are non-compliant with tag policies" event_pattern = jsonencode({ source = ["aws.config"] detail-type = ["Config Rules Compliance Change"] detail = { configRuleName = [aws_config_config_rule.required_tags.name] newEvaluationResult = { complianceType = ["NON_COMPLIANT"] } } }) } resource "aws_cloudwatch_event_target" "sns" { rule = aws_cloudwatch_event_rule.config_non_compliant.name target_id = "SendToSNS" arn = aws_sns_topic.compliance_alerts.arn } # Auto-remediation (optional): Add missing tags via Systems Manager resource "aws_config_remediation_configuration" "add_default_tags" { config_rule_name = aws_config_config_rule.required_tags.name resource_type = "AWS::EC2::Instance" target_type = "SSM_DOCUMENT" target_identifier = aws_ssm_document.add_tags.name target_version = "1" parameter { name = "InstanceId" resource_value = "RESOURCE_ID" } parameter { name = "Tags" static_value = jsonencode({ ManagedBy = "terraform" Owner = "[email protected]" }) } automatic = true maximum_automatic_attempts = 5 retry_attempt_seconds = 60 } resource "aws_ssm_document" "add_tags" { name = "AddRequiredTags" document_type = "Automation" content = jsonencode({ schemaVersion = "0.3" description = "Add required tags to EC2 instances" parameters = { InstanceId = { type = "String" } Tags = { type = "String" } } mainSteps = [{ name = "addTags" action = "aws:createTags" inputs = { ResourceType = "EC2" ResourceIds = ["{{ InstanceId }}"] Tags = "{{ Tags }}" } }] }) } ``` **Cost**: AWS Config charges $0.003 per configuration item recorded + $0.001 per rule evaluation --- ### Pattern 2: AWS Tag Policies (Preventive Enforcement) Enforce tags at AWS Organizations level (preventive, not reactive). **Use case**: Define allowed tag keys and values across entire AWS Organization #### Tag Policy JSON ```json { "tags": { "Environment": { "tag_key": { "@@assign": "Environment", "@@operators_allowed_for_child_policies": ["@@none"] }, "tag_value": { "@@assign": ["prod", "staging", "dev", "test"] }, "enforced_for": { "@@assign": [ "ec2:instance", "rds:db", "s3:bucket", "lambda:function" ] } }, "Owner": { "tag_key": { "@@assign": "Owner" }, "tag_value": { "@@assign": ["*@company.com"] }, "enforced_for": { "@@assign": ["*"] } }, "CostCenter": { "tag_key": { "@@assign": "CostCenter" }, "tag_value": { "@@assign": ["CC-*"] }, "enforced_for": { "@@assign": ["*"] } } } } ``` #### Terraform Deployment ```hcl resource "aws_organizations_policy" "tag_policy" { name = "enforce-required-tags" description = "Enforce required tags on all resources" type = "TAG_POLICY" content = file("${path.module}/policies/tag-policy.json") } resource "aws_organizations_policy_attachment" "tag_policy_root" { policy_id = aws_organizations_policy.tag_policy.id target_id = data.aws_organizations_organization.current.roots[0].id } ``` **Limitations**: Tag policies provide case-sensitivity enforcement and value constraints, but do NOT block resource creation if tags are missing (use SCPs for that). --- ### Pattern 3: Service Control Policies (Hard Deny) Block resource creation if required tags are missing. **Use case**: Prevent untagged resources at creation time (hard enforcement) #### SCP JSON ```json { "Version": "2012-10-17", "Statement": [ { "Sid": "DenyEC2WithoutRequiredTags", "Effect": "Deny", "Action": [ "ec2:RunInstances" ], "Resource": [ "arn:aws:ec2:*:*:instance/*" ], "Condition": { "StringNotLike": { "aws:RequestTag/Environment": ["prod", "staging", "dev", "test"], "aws:RequestTag/Owner": "*@company.com", "aws:RequestTag/CostCenter": "CC-*", "aws:RequestTag/Project": "*", "aws:RequestTag/ManagedBy": "*" } } }, { "Sid": "DenyS3WithoutRequiredTags", "Effect": "Deny", "Action": [ "s3:CreateBucket" ], "Resource": "*", "Condition": { "StringNotLike": { "aws:RequestTag/Environment": ["prod", "staging", "dev"], "aws:RequestTag/Owner": "*@company.com" } } } ] } ``` #### Terraform Deployment ```hcl resource "aws_organizations_policy" "deny_untagged" { name = "deny-untagged-resources" description = "Deny resource creation without required tags" type = "SERVICE_CONTROL_POLICY" content = file("${path.module}/policies/scp-deny-untagged.json") } resource "aws_organizations_policy_attachment" "scp_attach" { policy_id = aws_organizations_policy.deny_untagged.id target_id = data.aws_organizations_organization.current.roots[0].id } ``` **Warning**: SCPs are powerful (can lock out root user). Test in sandbox account first. --- ### Pattern 4: CloudFormation StackSets with Tag Propagation Apply tags to all resources in a CloudFormation stack automatically. **Use case**: Consistent tagging across multi-account deployments ```yaml AWSTemplateFormatVersion: '2010-09-09' Description: 'Example stack with tag propagation' Parameters: Environment: Type: String AllowedValues: [prod, staging, dev] Default: dev Owner: Type: String Default: [email protected] CostCenter: Type: String AllowedPattern: 'CC-[0-9]{4}' Project: Type: String # Apply these tags to ALL resources in this stack Tags: - Key: Environment Value: !Ref Environment - Key: Owner Value: !Ref Owner - Key: CostCenter Value: !Ref CostCenter - Key: Project Value: !Ref Project - Key: ManagedBy Value: cloudformation - Key: StackId Value: !Ref AWS::StackId - Key: StackName Value: !Ref AWS::StackName Resources: AppInstance: Type: AWS::EC2::Instance Properties: ImageId: !Ref LatestAmiId InstanceType: t3.medium # Tags inherited from stack-level tags automatically Tags: - Key: Name Value: !Sub '${Environment}-${Project}-app' - Key: Component Value: web ``` --- ## Azure Tag Enforcement Azure Policy provides tag enforcement, inheritance, and auto-remediation. ### Pattern 1: Require Tags (Hard Enforcement) Deny resource creation if required tags are missing. #### Azure Policy Definition (JSON) ```json { "mode": "Indexed", "policyRule": { "if": { "anyOf": [ { "field": "tags['Environment']", "exists": "false" }, { "field": "tags['Owner']", "exists": "false" }, { "field": "tags['CostCenter']", "exists": "false" } ] }, "then": { "effect": "deny" } }, "parameters": {} } ``` #### Terraform Deployment ```hcl resource "azurerm_policy_definition" "require_tags" { name = "require-standard-tags" policy_type = "Custom" mode = "Indexed" display_name = "Require standard tags on all resources" policy_rule = file("${path.module}/policies/require-tags.json") } resource "azurerm_policy_assignment" "require_tags_subscription" { name = "require-tags" policy_definition_id = azurerm_policy_definition.require_tags.id scope = "/subscriptions/${var.subscription_id}" } ``` --- ### Pattern 2: Tag Inheritance from Resource Group Automatically inherit tags from parent resource group. #### Azure Policy Definition (JSON) ```json { "mode": "Indexed", "policyRule": { "if": { "allOf": [ { "field": "[concat('tags[', parameters('tagName'), ']')]", "exists": "false" }, { "value": "[resourceGroup().tags[parameters('tagName')]]", "notEquals": "" } ] }, "then": { "effect": "modify", "details": { "roleDefinitionIds": [ "/providers/microsoft.authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c" ], "operations": [ { "operation": "add", "field": "[concat('tags[', parameters('tagName'), ']')]", "value": "[resourceGroup().tags[parameters('tagName')]]" } ] } } }, "parameters": { "tagName": { "type": "String", "metadata": { "displayName": "Tag Name", "description": "Name of the tag to inherit from resource group" } } } } ``` #### Terraform Deployment ```hcl resource "azurerm_policy_definition" "inherit_tags" { name = "inherit-tags-from-rg" policy_type = "Custom" mode = "Indexed" display_name = "Inherit tags from resource group" policy_rule = file("${path.module}/policies/inherit-tags.json") parameters = jsonencode({ tagName = { type = "String" metadata = { displayName = "Tag Name" description = "Name of the tag to inherit" } } }) } # Assign policy for each tag to inherit resource "azurerm_policy_assignment" "inherit_environment" { name = "inherit-environment-tag" policy_definition_id = azurerm_policy_definition.inherit_tags.id scope = "/subscriptions/${var.subscription_id}" parameters = jsonencode({ tagName = { value = "Environment" } }) } resource "azurerm_policy_assignment" "inherit_owner" { name = "inherit-owner-tag" policy_definition_id = azurerm_policy_definition.inherit_tags.id scope = "/subscriptions/${var.subscription_id}" parameters = jsonencode({ tagName = { value = "Owner" } }) } ``` --- ### Pattern 3: Tag Value Enforcement (Enum) Restrict tag values to allowed list. #### Azure Policy Definition (JSON) ```json { "mode": "Indexed", "policyRule": { "if": { "not": { "field": "tags['Environment']", "in": "[parameters('allowedValues')]" } }, "then": { "effect": "deny" } }, "parameters": { "allowedValues": { "type": "Array", "metadata": { "displayName": "Allowed Environment values", "description": "List of allowed values for Environment tag" }, "defaultValue": ["prod", "staging", "dev", "test"] } } } ``` #### Terraform Deployment ```hcl resource "azurerm_policy_definition" "enforce_environment_values" { name = "enforce-environment-tag-values" policy_type = "Custom" mode = "Indexed" display_name = "Enforce Environment tag values" policy_rule = file("${path.module}/policies/enforce-environment-values.json") parameters = jsonencode({ allowedValues = { type = "Array" metadata = { displayName = "Allowed Environment values" } defaultValue = ["prod", "staging", "dev", "test"] } }) } resource "azurerm_policy_assignment" "enforce_environment" { name = "enforce-environment-values" policy_definition_id = azurerm_policy_definition.enforce_environment_values.id scope = "/subscriptions/${var.subscription_id}" } ``` --- ## GCP Label Enforcement GCP uses Organization Policies to enforce label requirements. ### Pattern 1: Require Labels (Hard Enforcement) Deny resource creation if required labels are missing. **Note**: GCP Organization Policies enforce at folder/organization level, not per-resource. Use custom constraints for per-resource enforcement. #### Organization Policy (YAML) ```yaml constraint: constraints/gcp.resourceLabels listPolicy: allowedValues: - environment:prod - environment:staging - environment:dev - owner:*@company.com - costcenter:cc-* - project:* deniedValues: [] inheritFromParent: true ``` #### Terraform Deployment ```hcl resource "google_organization_policy" "require_labels" { org_id = var.organization_id constraint = "constraints/gcp.resourceLabels" list_policy { allow { values = [ "environment:prod", "environment:staging", "environment:dev", ] } suggested_value = "environment:dev" inherit_from_parent = true } } ``` --- ### Pattern 2: Custom Constraint (Regex Validation) Enforce label value patterns using custom constraints. #### Custom Constraint (YAML) ```yaml name: organizations/{org_id}/customConstraints/custom.requireOwnerEmail resource_types: - "*" method_types: - CREATE - UPDATE condition: "resource.labels.owner.matches('^[a-z0-9._%+-]+@company\\\\.com$')" action_type: DENY display_name: "Require owner label with company domain email" description: "All resources must have an owner label with @company.com email" ``` #### Terraform Deployment ```hcl resource "google_org_policy_custom_constraint" "owner_email" { parent = "organizations/${var.organization_id}" name = "custom.requireOwnerEmail" action_type = "DENY" condition = "resource.labels.owner.matches('^[a-z0-9._%+-]+@company\\\\.com$')" method_types = ["CREATE", "UPDATE"] resource_types = ["*"] display_name = "Require owner label with company domain email" description = "All resources must have an owner label with @company.com email" } resource "google_org_policy_policy" "owner_email_policy" { name = "${google_org_policy_custom_constraint.owner_email.parent}/policies/${google_org_policy_custom_constraint.owner_email.name}" parent = google_org_policy_custom_constraint.owner_email.parent spec { rules { enforce = "TRUE" } } } ``` --- ### Pattern 3: Label Inheritance from Project Apply project-level labels to all resources automatically. ```hcl resource "google_project" "app_project" { name = "ecommerce-prod" project_id = "ecommerce-prod-12345" org_id = var.organization_id labels = { environment = "prod" owner = "platform-team" costcenter = "cc-1234" project = "ecommerce" managedby = "terraform" } } # All resources in this project inherit these labels resource "google_compute_instance" "app_server" { name = "prod-app-server" machine_type = "n1-standard-1" zone = "us-central1-a" project = google_project.app_project.project_id # Additional resource-specific labels labels = { component = "web" backup = "daily" } } ``` --- ## Kubernetes Label Enforcement Kubernetes uses admission controllers (OPA Gatekeeper, Kyverno) to enforce label requirements. ### Pattern 1: OPA Gatekeeper (Require Labels) Block pod creation if required labels are missing. #### Constraint Template ```yaml apiVersion: templates.gatekeeper.sh/v1 kind: ConstraintTemplate metadata: name: k8srequiredlabels spec: crd: spec: names: kind: K8sRequiredLabels validation: openAPIV3Schema: type: object properties: labels: type: array items: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package k8srequiredlabels violation[{"msg": msg, "details": {"missing_labels": missing}}] { provided := {label | input.review.object.metadata.labels[label]} required := {label | label := input.parameters.labels[_]} missing := required - provided count(missing) > 0 msg := sprintf("Resource is missing required labels: %v", [missing]) } ``` #### Constraint ```yaml apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sRequiredLabels metadata: name: require-standard-labels spec: match: kinds: - apiGroups: [""] kinds: ["Pod", "Service"] - apiGroups: ["apps"] kinds: ["Deployment", "StatefulSet", "DaemonSet"] parameters: labels: - "environment" - "owner" - "project" - "app" ``` --- ### Pattern 2: Kyverno Auto-Labeling Automatically add labels to resources based on namespace or other metadata. ```yaml apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: add-default-labels spec: background: false rules: - name: add-environment-from-namespace match: any: - resources: kinds: - Pod - Deployment - StatefulSet mutate: patchStrategicMerge: metadata: labels: +(environment): "{{request.namespace}}" +(managed-by): "kyverno" +(created-by): "{{request.userInfo.username}}" - name: add-owner-from-namespace match: any: - resources: kinds: - Pod mutate: patchStrategicMerge: metadata: labels: +(owner): "{{request.object.metadata.namespace}}[email protected]" - name: require-cost-center-label match: any: - resources: kinds: - PersistentVolumeClaim - Service validate: message: "CostCenter label is required for billing resources" pattern: metadata: labels: costcenter: "?*" ``` --- ### Pattern 3: Kyverno Label Value Validation Enforce label value patterns (enum or regex). ```yaml apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: validate-label-values spec: validationFailureAction: enforce background: true rules: - name: validate-environment-label match: any: - resources: kinds: - Pod - Deployment validate: message: "Environment label must be one of: prod, staging, dev, test" pattern: metadata: labels: environment: "prod|staging|dev|test" - name: validate-owner-email-format match: any: - resources: kinds: - Deployment validate: message: "Owner label must be a valid email address" deny: conditions: any: - key: "{{request.object.metadata.labels.owner}}" operator: NotEquals value: "*@company.com" ``` --- ## Multi-Cloud Enforcement Enforce consistent tagging across AWS, Azure, and GCP using infrastructure as code. ### Terraform Multi-Cloud Tags Module ```hcl # modules/standard-tags/variables.tf variable "environment" { type = string description = "Environment (prod, staging, dev)" validation { condition = contains(["prod", "staging", "dev", "test"], var.environment) error_message = "Environment must be prod, staging, dev, or test" } } variable "owner" { type = string description = "Team email address" validation { condition = can(regex("^[a-z0-9._%+-]+@company\\.com$", var.owner)) error_message = "Owner must be a valid @company.com email" } } variable "cost_center" { type = string description = "Finance cost center code" validation { condition = can(regex("^CC-[0-9]{4}$", var.cost_center)) error_message = "CostCenter must match format CC-####" } } variable "project" { type = string description = "Project name" } # modules/standard-tags/outputs.tf output "aws_tags" { value = { Environment = var.environment Owner = var.owner CostCenter = var.cost_center Project = var.project ManagedBy = "terraform" } } output "azure_tags" { value = { Environment = var.environment Owner = var.owner CostCenter = var.cost_center Project = var.project ManagedBy = "terraform" } } output "gcp_labels" { # GCP requires lowercase value = { environment = lower(var.environment) owner = lower(replace(var.owner, "@", "-at-")) costcenter = lower(var.cost_center) project = lower(var.project) managedby = "terraform" } } output "kubernetes_labels" { value = { environment = var.environment owner = var.owner costcenter = var.cost_center project = var.project managedby = "terraform" } } ``` #### Usage ```hcl module "tags" { source = "./modules/standard-tags" environment = "prod" owner = "[email protected]" cost_center = "CC-1234" project = "ecommerce" } # AWS resources resource "aws_instance" "app" { ami = var.ami_id instance_type = "t3.medium" tags = merge( module.tags.aws_tags, { Name = "prod-app-server" Component = "web" } ) } # Azure resources resource "azurerm_virtual_machine" "app" { name = "prod-app-server" location = "eastus" resource_group_name = azurerm_resource_group.main.name tags = merge( module.tags.azure_tags, { Component = "web" } ) } # GCP resources resource "google_compute_instance" "app" { name = "prod-app-server" machine_type = "n1-standard-1" zone = "us-central1-a" labels = merge( module.tags.gcp_labels, { component = "web" } ) } ``` --- ## Pre-Deployment Validation Validate tags BEFORE deployment using IaC linting tools. ### Checkov (Python) ```bash # Install Checkov pip install checkov # Run tag validation on Terraform checkov -d . --framework terraform --check CKV_AWS_111 # Custom Checkov policy for required tags cat > .checkov.yaml <<EOF --- framework: terraform checks: - id: CKV_CUSTOM_1 name: "Ensure all resources have required tags" guideline: "All resources must have Environment, Owner, CostCenter, Project tags" resource_types: - aws_instance - aws_db_instance - aws_s3_bucket tags_required: - Environment - Owner - CostCenter - Project EOF ``` ### tflint (Go) ```bash # Install tflint brew install tflint # Configure tflint for tag enforcement cat > .tflint.hcl <<EOF rule "aws_resource_missing_tags" { enabled = true tags = ["Environment", "Owner", "CostCenter", "Project", "ManagedBy"] } EOF # Run tflint tflint ``` ### terraform-compliance (Python BDD) ```bash # Install terraform-compliance pip install terraform-compliance # Define tag compliance tests mkdir -p compliance cat > compliance/tags.feature <<EOF Feature: Resource Tagging Scenario: Ensure all EC2 instances have required tags Given I have aws_instance defined Then it must contain tags And it must contain Environment And it must contain Owner And it must contain CostCenter And it must contain Project EOF # Run compliance tests terraform-compliance -f compliance -p terraform.plan.json ``` --- ## Best Practices Summary 1. **Start with soft enforcement** (alerts only), transition to hard enforcement after 30-90 days 2. **Use IaC for tags** (Terraform provider default_tags) to reduce manual errors by 95% 3. **Inherit from parents** (resource groups, folders, namespaces) to reduce tagging effort 4. **Validate pre-deployment** (Checkov, tflint) to catch violations before creation 5. **Automate remediation** (AWS Config, Azure Policy modify effect) for missing tags 6. **Audit weekly** (AWS Config, Azure Resource Graph, GCP Asset Inventory) to identify drift 7. **Test enforcement in sandbox** (SCPs/policies can lock out root user - test first) 8. **Document exceptions** (create waiver process for valid exceptions to tag policies) ``` ### references/compliance-auditing.md ```markdown # Tag Compliance Auditing Complete guide to auditing resource tagging compliance across AWS, Azure, GCP, and Kubernetes with automated queries and remediation scripts. ## Table of Contents 1. [Compliance Auditing Overview](#compliance-auditing-overview) 2. [AWS Tag Compliance](#aws-tag-compliance) 3. [Azure Tag Compliance](#azure-tag-compliance) 4. [GCP Label Compliance](#gcp-label-compliance) 5. [Kubernetes Label Compliance](#kubernetes-label-compliance) 6. [Automated Remediation](#automated-remediation) 7. [Compliance Dashboards](#compliance-dashboards) --- ## Compliance Auditing Overview Regular tag compliance audits (weekly recommended) identify untagged resources, prevent tag drift, and maintain cost allocation accuracy. ``` ┌────────────────────────────────────────────────────────┐ │ Tag Compliance Audit Workflow │ ├────────────────────────────────────────────────────────┤ │ │ │ 1. DISCOVERY │ │ ├── Find all resources in scope │ │ └── Filter by resource type (EC2, RDS, S3, etc.) │ │ │ │ 2. VALIDATION │ │ ├── Check required tags present │ │ ├── Validate tag value formats │ │ └── Identify missing or invalid tags │ │ │ │ 3. REPORTING │ │ ├── Generate compliance report │ │ ├── Calculate compliance percentage │ │ └── List non-compliant resources │ │ │ │ 4. NOTIFICATION │ │ ├── Alert resource owners │ │ └── Escalate to management if needed │ │ │ │ 5. REMEDIATION │ │ ├── Auto-remediate (add default tags) │ │ ├── Manual remediation (contact owner) │ │ └── Exempt resources (documented waivers) │ │ │ │ 6. TRACKING │ │ └── Monitor compliance trends over time │ │ │ └────────────────────────────────────────────────────────┘ ``` **Audit frequency recommendations**: - **Weekly**: Standard compliance auditing (production environments) - **Daily**: High-compliance environments (financial, healthcare) - **Monthly**: Low-priority environments (development, sandbox) --- ## AWS Tag Compliance ### Method 1: AWS Config Advanced Queries Query untagged resources using AWS Config SQL. ```sql -- Find all resources missing Environment tag SELECT resourceId, resourceType, resourceName, awsRegion, availabilityZone, configuration.tags WHERE resourceType IN ( 'AWS::EC2::Instance', 'AWS::RDS::DBInstance', 'AWS::S3::Bucket', 'AWS::Lambda::Function', 'AWS::DynamoDB::Table', 'AWS::ECS::Service', 'AWS::EKS::Cluster' ) AND ( configuration.tags IS NULL OR NOT EXISTS(SELECT 1 FROM configuration.tags WHERE key = 'Environment') ) ORDER BY resourceType, resourceId ``` ```sql -- Find resources missing ANY required tag SELECT resourceId, resourceType, resourceName, awsRegion, configuration.tags WHERE resourceType IN ( 'AWS::EC2::Instance', 'AWS::RDS::DBInstance', 'AWS::S3::Bucket' ) AND ( configuration.tags IS NULL OR NOT EXISTS(SELECT 1 FROM configuration.tags WHERE key = 'Environment') OR NOT EXISTS(SELECT 1 FROM configuration.tags WHERE key = 'Owner') OR NOT EXISTS(SELECT 1 FROM configuration.tags WHERE key = 'CostCenter') OR NOT EXISTS(SELECT 1 FROM configuration.tags WHERE key = 'Project') OR NOT EXISTS(SELECT 1 FROM configuration.tags WHERE key = 'ManagedBy') ) ORDER BY resourceType ``` ```sql -- Find resources with invalid Environment values SELECT resourceId, resourceType, resourceName, configuration.tags WHERE resourceType IN ('AWS::EC2::Instance', 'AWS::RDS::DBInstance') AND EXISTS( SELECT 1 FROM configuration.tags WHERE key = 'Environment' AND value NOT IN ('prod', 'staging', 'dev', 'test') ) ``` **Run via AWS Console**: 1. Navigate to AWS Config → Advanced queries 2. Paste SQL query 3. Click "Run query" 4. Export results as CSV **Run via AWS CLI**: ```bash # Save query to file cat > query.sql <<EOF SELECT resourceId, resourceType, configuration.tags WHERE resourceType = 'AWS::EC2::Instance' AND NOT EXISTS(SELECT 1 FROM configuration.tags WHERE key = 'Environment') EOF # Execute query aws configservice select-resource-config \ --expression file://query.sql \ --output json | jq -r '.Results[] | fromjson | {ResourceId: .resourceId, ResourceType: .resourceType}' ``` --- ### Method 2: AWS CLI Resource Groups Tagging API ```bash # Find all untagged EC2 instances aws resourcegroupstaggingapi get-resources \ --resource-type-filters "ec2:instance" \ --query 'ResourceTagMappingList[?Tags==`null` || Tags==`[]`].{ResourceARN:ResourceARN}' \ --output table # Find EC2 instances missing Environment tag aws resourcegroupstaggingapi get-resources \ --resource-type-filters "ec2:instance" \ --query 'ResourceTagMappingList[?!contains(Tags[].Key, `Environment`)].{ResourceARN:ResourceARN, Tags:Tags}' \ --output json # Find resources missing required tags (Environment, Owner, CostCenter) aws resourcegroupstaggingapi get-resources \ --resource-type-filters "ec2:instance" "rds:db" "s3:bucket" \ --query 'ResourceTagMappingList[?!(contains(Tags[].Key, `Environment`) && contains(Tags[].Key, `Owner`) && contains(Tags[].Key, `CostCenter`))]' \ --output json > untagged_resources.json ``` --- ### Method 3: Boto3 Python Script ```python #!/usr/bin/env python3 """ AWS Tag Compliance Audit Script Checks all resources for required tags and generates compliance report. """ import boto3 import csv from datetime import datetime # Required tags REQUIRED_TAGS = ['Environment', 'Owner', 'CostCenter', 'Project', 'ManagedBy'] # Resource types to audit RESOURCE_TYPES = [ 'ec2:instance', 'rds:db', 's3:bucket', 'lambda:function', 'dynamodb:table', 'ecs:service', 'eks:cluster' ] def get_untagged_resources(): """Find resources missing required tags.""" client = boto3.client('resourcegroupstaggingapi') untagged_resources = [] for resource_type in RESOURCE_TYPES: paginator = client.get_paginator('get_resources') page_iterator = paginator.paginate( ResourceTypeFilters=[resource_type] ) for page in page_iterator: for resource in page['ResourceTagMappingList']: arn = resource['ResourceARN'] tags = {tag['Key']: tag['Value'] for tag in resource.get('Tags', [])} # Check which required tags are missing missing_tags = [tag for tag in REQUIRED_TAGS if tag not in tags] if missing_tags: untagged_resources.append({ 'ResourceARN': arn, 'ResourceType': resource_type, 'MissingTags': ', '.join(missing_tags), 'ExistingTags': str(tags) }) return untagged_resources def generate_compliance_report(untagged_resources, output_file): """Generate CSV compliance report.""" with open(output_file, 'w', newline='') as csvfile: fieldnames = ['ResourceARN', 'ResourceType', 'MissingTags', 'ExistingTags'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for resource in untagged_resources: writer.writerow(resource) print(f"Compliance report generated: {output_file}") print(f"Total non-compliant resources: {len(untagged_resources)}") def calculate_compliance_percentage(): """Calculate overall tag compliance percentage.""" client = boto3.client('resourcegroupstaggingapi') total_resources = 0 compliant_resources = 0 for resource_type in RESOURCE_TYPES: paginator = client.get_paginator('get_resources') page_iterator = paginator.paginate( ResourceTypeFilters=[resource_type] ) for page in page_iterator: for resource in page['ResourceTagMappingList']: total_resources += 1 tags = {tag['Key'] for tag in resource.get('Tags', [])} # Check if all required tags present if all(required_tag in tags for required_tag in REQUIRED_TAGS): compliant_resources += 1 compliance_pct = (compliant_resources / total_resources * 100) if total_resources > 0 else 0 print(f"\n=== Tag Compliance Summary ===") print(f"Total resources audited: {total_resources}") print(f"Compliant resources: {compliant_resources}") print(f"Non-compliant resources: {total_resources - compliant_resources}") print(f"Compliance percentage: {compliance_pct:.2f}%") return compliance_pct if __name__ == '__main__': print("Starting AWS tag compliance audit...") untagged = get_untagged_resources() report_file = f"aws_tag_compliance_{datetime.now().strftime('%Y%m%d')}.csv" generate_compliance_report(untagged, report_file) calculate_compliance_percentage() ``` **Run script**: ```bash chmod +x audit_aws_tags.py ./audit_aws_tags.py ``` **Output**: ``` Starting AWS tag compliance audit... Compliance report generated: aws_tag_compliance_20251204.csv Total non-compliant resources: 47 === Tag Compliance Summary === Total resources audited: 523 Compliant resources: 476 Non-compliant resources: 47 Compliance percentage: 91.01% ``` --- ## Azure Tag Compliance ### Method 1: Azure Resource Graph Query ```kusto // Find resources missing required tags Resources | where type in~ ( 'microsoft.compute/virtualmachines', 'microsoft.storage/storageaccounts', 'microsoft.web/sites', 'microsoft.sql/servers/databases' ) | where isnull(tags.Environment) or isnull(tags.Owner) or isnull(tags.CostCenter) or isnull(tags.Project) or isnull(tags.ManagedBy) | project name, type, resourceGroup, subscriptionId, location, tags, missingTags = pack_array( iff(isnull(tags.Environment), 'Environment', ''), iff(isnull(tags.Owner), 'Owner', ''), iff(isnull(tags.CostCenter), 'CostCenter', ''), iff(isnull(tags.Project), 'Project', ''), iff(isnull(tags.ManagedBy), 'ManagedBy', '') ) | extend missingTags = array_strcat(array_select(missingTags, x, strlen(x) > 0), ', ') | order by name asc ``` ```kusto // Calculate tag compliance percentage Resources | where type in~ ( 'microsoft.compute/virtualmachines', 'microsoft.storage/storageaccounts', 'microsoft.web/sites' ) | extend hasAllTags = ( not(isnull(tags.Environment)) and not(isnull(tags.Owner)) and not(isnull(tags.CostCenter)) and not(isnull(tags.Project)) ) | summarize TotalResources = count(), CompliantResources = countif(hasAllTags), NonCompliantResources = countif(not(hasAllTags)) | extend CompliancePercentage = round(todouble(CompliantResources) / todouble(TotalResources) * 100, 2) ``` **Run via Azure Portal**: 1. Navigate to Azure Resource Graph Explorer 2. Paste KQL query 3. Click "Run query" 4. Export results **Run via Azure CLI**: ```bash # Find untagged resources az graph query -q "Resources | where type =~ 'microsoft.compute/virtualmachines' | where isnull(tags.Environment) | project name, resourceGroup, tags" \ --output table # Export to JSON az graph query -q "Resources | where type in~ ('microsoft.compute/virtualmachines') | where isnull(tags.Environment) or isnull(tags.Owner)" \ --output json > azure_untagged.json ``` --- ### Method 2: Azure CLI Resource List ```bash # List all VMs without Environment tag az vm list --query "[?tags.Environment==null].{Name:name, ResourceGroup:resourceGroup, Tags:tags}" \ --output table # List all storage accounts without required tags az storage account list --query "[?tags.Environment==null || tags.Owner==null].{Name:name, ResourceGroup:resourceGroup, Tags:tags}" \ --output json ``` --- ### Method 3: PowerShell Script ```powershell # Azure Tag Compliance Audit Script $RequiredTags = @('Environment', 'Owner', 'CostCenter', 'Project', 'ManagedBy') $ResourceTypes = @( 'Microsoft.Compute/virtualMachines', 'Microsoft.Storage/storageAccounts', 'Microsoft.Web/sites', 'Microsoft.Sql/servers/databases' ) $UntaggedResources = @() foreach ($ResourceType in $ResourceTypes) { $Resources = Get-AzResource -ResourceType $ResourceType foreach ($Resource in $Resources) { $MissingTags = @() foreach ($RequiredTag in $RequiredTags) { if (-not $Resource.Tags.ContainsKey($RequiredTag)) { $MissingTags += $RequiredTag } } if ($MissingTags.Count -gt 0) { $UntaggedResources += [PSCustomObject]@{ ResourceName = $Resource.Name ResourceType = $Resource.ResourceType ResourceGroup = $Resource.ResourceGroupName MissingTags = $MissingTags -join ', ' } } } } # Export to CSV $UntaggedResources | Export-Csv -Path "azure_tag_compliance_$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation # Calculate compliance $TotalResources = (Get-AzResource -ResourceType $ResourceTypes).Count $NonCompliantResources = $UntaggedResources.Count $CompliantResources = $TotalResources - $NonCompliantResources $CompliancePct = ($CompliantResources / $TotalResources) * 100 Write-Host "`n=== Azure Tag Compliance Summary ===" Write-Host "Total resources audited: $TotalResources" Write-Host "Compliant resources: $CompliantResources" Write-Host "Non-compliant resources: $NonCompliantResources" Write-Host "Compliance percentage: $([math]::Round($CompliancePct, 2))%" ``` --- ## GCP Label Compliance ### Method 1: Cloud Asset Inventory Query ```bash # Find all Compute instances without required labels gcloud asset search-all-resources \ --scope=organizations/123456789 \ --asset-types=compute.googleapis.com/Instance \ --query="NOT labels:environment OR NOT labels:owner OR NOT labels:costcenter OR NOT labels:project" \ --format="table(name,assetType,labels)" # Export to JSON gcloud asset search-all-resources \ --scope=projects/my-project-id \ --asset-types=compute.googleapis.com/Instance,storage.googleapis.com/Bucket \ --query="NOT labels:environment" \ --format=json > gcp_untagged.json ``` --- ### Method 2: GCP Python Script ```python #!/usr/bin/env python3 """ GCP Label Compliance Audit Script """ from google.cloud import asset_v1 from google.cloud import compute_v1 import csv from datetime import datetime REQUIRED_LABELS = ['environment', 'owner', 'costcenter', 'project', 'managedby'] ORGANIZATION_ID = 'organizations/123456789' # or 'projects/my-project-id' def audit_compute_instances(): """Audit Compute Engine instances for required labels.""" client = compute_v1.InstancesClient() project_id = 'my-project-id' unlabeled_instances = [] # List all zones zones_client = compute_v1.ZonesClient() zones = zones_client.list(project=project_id) for zone in zones: instances = client.list(project=project_id, zone=zone.name) for instance in instances: labels = instance.labels or {} missing_labels = [label for label in REQUIRED_LABELS if label not in labels] if missing_labels: unlabeled_instances.append({ 'Name': instance.name, 'Zone': zone.name, 'MissingLabels': ', '.join(missing_labels), 'ExistingLabels': str(labels) }) return unlabeled_instances def generate_compliance_report(unlabeled_resources, output_file): """Generate CSV compliance report.""" with open(output_file, 'w', newline='') as csvfile: fieldnames = ['Name', 'Zone', 'MissingLabels', 'ExistingLabels'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for resource in unlabeled_resources: writer.writerow(resource) print(f"Compliance report generated: {output_file}") print(f"Total non-compliant resources: {len(unlabeled_resources)}") if __name__ == '__main__': print("Starting GCP label compliance audit...") unlabeled = audit_compute_instances() report_file = f"gcp_label_compliance_{datetime.now().strftime('%Y%m%d')}.csv" generate_compliance_report(unlabeled, report_file) ``` --- ## Kubernetes Label Compliance ### Method 1: kubectl Query ```bash #!/bin/bash # Kubernetes Label Compliance Audit Script REQUIRED_LABELS=("environment" "owner" "project" "app") echo "=== Kubernetes Label Compliance Audit ===" echo "Checking for missing required labels: ${REQUIRED_LABELS[@]}" echo "" for KIND in pod deployment statefulset service daemonset; do echo "--- Auditing: $KIND ---" kubectl get $KIND --all-namespaces -o json | jq -r ' .items[] | select( .metadata.labels.environment == null or .metadata.labels.owner == null or .metadata.labels.project == null or .metadata.labels.app == null ) | "\(.metadata.namespace)/\(.metadata.name): missing labels \( [ (if .metadata.labels.environment == null then "environment" else empty end), (if .metadata.labels.owner == null then "owner" else empty end), (if .metadata.labels.project == null then "project" else empty end), (if .metadata.labels.app == null then "app" else empty end) ] | join(", ") )" ' echo "" done ``` --- ### Method 2: Python Kubernetes Client ```python #!/usr/bin/env python3 """ Kubernetes Label Compliance Audit Script """ from kubernetes import client, config import csv from datetime import datetime REQUIRED_LABELS = ['environment', 'owner', 'project', 'app'] def audit_kubernetes_resources(): """Audit Kubernetes resources for required labels.""" config.load_kube_config() v1 = client.CoreV1Api() apps_v1 = client.AppsV1Api() unlabeled_resources = [] # Audit Pods pods = v1.list_pod_for_all_namespaces() for pod in pods.items: missing_labels = [label for label in REQUIRED_LABELS if label not in (pod.metadata.labels or {})] if missing_labels: unlabeled_resources.append({ 'Kind': 'Pod', 'Namespace': pod.metadata.namespace, 'Name': pod.metadata.name, 'MissingLabels': ', '.join(missing_labels) }) # Audit Deployments deployments = apps_v1.list_deployment_for_all_namespaces() for deployment in deployments.items: missing_labels = [label for label in REQUIRED_LABELS if label not in (deployment.metadata.labels or {})] if missing_labels: unlabeled_resources.append({ 'Kind': 'Deployment', 'Namespace': deployment.metadata.namespace, 'Name': deployment.metadata.name, 'MissingLabels': ', '.join(missing_labels) }) return unlabeled_resources def generate_k8s_compliance_report(unlabeled_resources, output_file): """Generate CSV compliance report.""" with open(output_file, 'w', newline='') as csvfile: fieldnames = ['Kind', 'Namespace', 'Name', 'MissingLabels'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for resource in unlabeled_resources: writer.writerow(resource) print(f"Kubernetes compliance report generated: {output_file}") print(f"Total non-compliant resources: {len(unlabeled_resources)}") if __name__ == '__main__': print("Starting Kubernetes label compliance audit...") unlabeled = audit_kubernetes_resources() report_file = f"k8s_label_compliance_{datetime.now().strftime('%Y%m%d')}.csv" generate_k8s_compliance_report(unlabeled, report_file) ``` --- ## Automated Remediation ### AWS Auto-Remediation (Systems Manager) ```hcl # SSM Automation document to add missing tags resource "aws_ssm_document" "add_default_tags" { name = "AddDefaultTags" document_type = "Automation" content = jsonencode({ schemaVersion = "0.3" description = "Add default tags to untagged EC2 instances" parameters = { InstanceId = { type = "String" description = "EC2 instance ID to tag" } Environment = { type = "String" default = "unknown" description = "Environment tag value" } } mainSteps = [{ name = "addTags" action = "aws:createTags" inputs = { ResourceType = "EC2" ResourceIds = ["{{ InstanceId }}"] Tags = { Environment = "{{ Environment }}" Owner = "[email protected]" ManagedBy = "auto-remediation" } } }] }) } # Lambda function to auto-tag new resources resource "aws_lambda_function" "auto_tag" { filename = "auto_tag_lambda.zip" function_name = "auto-tag-resources" role = aws_iam_role.lambda_auto_tag.arn handler = "index.handler" runtime = "python3.11" environment { variables = { DEFAULT_TAGS = jsonencode({ ManagedBy = "auto-tagging" Owner = "[email protected]" }) } } } # EventBridge rule to trigger Lambda on resource creation resource "aws_cloudwatch_event_rule" "new_ec2_instance" { name = "auto-tag-new-ec2-instances" description = "Trigger auto-tagging when EC2 instance created" event_pattern = jsonencode({ source = ["aws.ec2"] detail-type = ["EC2 Instance State-change Notification"] detail = { state = ["running"] } }) } resource "aws_cloudwatch_event_target" "lambda" { rule = aws_cloudwatch_event_rule.new_ec2_instance.name target_id = "AutoTagLambda" arn = aws_lambda_function.auto_tag.arn } ``` --- ### Azure Auto-Remediation (Policy) ```hcl # Azure Policy with automatic remediation resource "azurerm_policy_definition" "add_missing_tags" { name = "add-missing-tags-auto-remediate" policy_type = "Custom" mode = "Indexed" display_name = "Add missing tags with automatic remediation" policy_rule = jsonencode({ if = { field = "tags['Environment']" exists = "false" } then = { effect = "modify" details = { roleDefinitionIds = [ "/providers/microsoft.authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c" ] operations = [{ operation = "add" field = "tags['Environment']" value = "unknown" }] } } }) } resource "azurerm_policy_assignment" "auto_remediate" { name = "auto-add-missing-tags" policy_definition_id = azurerm_policy_definition.add_missing_tags.id scope = "/subscriptions/${var.subscription_id}" identity { type = "SystemAssigned" } } # Remediation task to fix existing resources resource "azurerm_policy_remediation" "fix_existing" { name = "fix-existing-untagged-resources" scope = "/subscriptions/${var.subscription_id}" policy_assignment_id = azurerm_policy_assignment.auto_remediate.id } ``` --- ## Compliance Dashboards ### CloudWatch Dashboard (AWS) ```hcl resource "aws_cloudwatch_dashboard" "tag_compliance" { dashboard_name = "tag-compliance-dashboard" dashboard_body = jsonencode({ widgets = [ { type = "metric" properties = { metrics = [ ["AWS/Config", "ComplianceScore", { stat = "Average" }] ] period = 300 stat = "Average" region = "us-east-1" title = "Tag Compliance Score" } } ] }) } ``` --- ### Grafana Dashboard (Multi-Cloud) ```json { "dashboard": { "title": "Multi-Cloud Tag Compliance", "panels": [ { "title": "Overall Compliance %", "targets": [ { "expr": "(count(aws_resources_compliant) / count(aws_resources_total)) * 100" } ] }, { "title": "Non-Compliant Resources by Cloud", "targets": [ { "expr": "sum by (cloud_provider) (resources_noncompliant)" } ] } ] } } ``` --- ## Best Practices Summary 1. **Audit weekly** (minimum) to catch drift early 2. **Automate remediation** where safe (add default tags, not delete resources) 3. **Track compliance trends** over time (dashboard, monthly reports) 4. **Notify resource owners** before escalating to management 5. **Document exceptions** (create waiver process for valid reasons) 6. **Integrate with CI/CD** (pre-deployment tag validation via Checkov/tflint) 7. **Use cloud-native tools** (AWS Config, Azure Resource Graph, GCP Asset Inventory) 8. **Export reports to data warehouse** for long-term trend analysis 9. **Set compliance targets** (e.g., 95% compliance by end of quarter) 10. **Celebrate improvements** (recognize teams that improve compliance) ``` ### scripts/audit_tags.py ```python #!/usr/bin/env python3 """ Multi-Cloud Tag Compliance Audit Script Audits resource tagging compliance across AWS, Azure, and GCP. Dependencies: pip install boto3 azure-mgmt-resource google-cloud-asset tabulate Usage: # Audit AWS only python audit_tags.py --cloud aws # Audit all clouds python audit_tags.py --cloud all # Export to CSV python audit_tags.py --cloud aws --output aws_audit.csv # Specify custom required tags python audit_tags.py --cloud aws --tags Environment Owner CostCenter """ import argparse import csv import sys from datetime import datetime from typing import List, Dict, Optional from tabulate import tabulate # Required tags (default) DEFAULT_REQUIRED_TAGS = ['Environment', 'Owner', 'CostCenter', 'Project', 'ManagedBy'] # Cloud-specific imports (optional - install as needed) try: import boto3 AWS_AVAILABLE = True except ImportError: AWS_AVAILABLE = False try: from azure.identity import DefaultAzureCredential from azure.mgmt.resource import ResourceManagementClient AZURE_AVAILABLE = True except ImportError: AZURE_AVAILABLE = False try: from google.cloud import asset_v1 GCP_AVAILABLE = True except ImportError: GCP_AVAILABLE = False class TagAuditor: """Base class for tag auditing.""" def __init__(self, required_tags: List[str]): self.required_tags = required_tags self.non_compliant_resources = [] def audit(self) -> List[Dict]: """Run audit and return non-compliant resources.""" raise NotImplementedError def calculate_compliance(self, total: int, compliant: int) -> float: """Calculate compliance percentage.""" if total == 0: return 100.0 return (compliant / total) * 100 class AWSTagAuditor(TagAuditor): """AWS tag compliance auditor.""" RESOURCE_TYPES = [ 'ec2:instance', 'rds:db', 's3:bucket', 'lambda:function', 'dynamodb:table', ] def audit(self) -> List[Dict]: """Audit AWS resources for tag compliance.""" if not AWS_AVAILABLE: print("ERROR: boto3 not installed. Run: pip install boto3") return [] client = boto3.client('resourcegroupstaggingapi') non_compliant = [] for resource_type in self.RESOURCE_TYPES: try: paginator = client.get_paginator('get_resources') page_iterator = paginator.paginate( ResourceTypeFilters=[resource_type] ) for page in page_iterator: for resource in page['ResourceTagMappingList']: arn = resource['ResourceARN'] tags = {tag['Key']: tag['Value'] for tag in resource.get('Tags', [])} # Check which required tags are missing missing_tags = [tag for tag in self.required_tags if tag not in tags] if missing_tags: non_compliant.append({ 'Cloud': 'AWS', 'ResourceARN': arn, 'ResourceType': resource_type, 'MissingTags': ', '.join(missing_tags), 'ExistingTags': str(tags) }) except Exception as e: print(f"Warning: Error auditing {resource_type}: {e}") self.non_compliant_resources = non_compliant return non_compliant class AzureTagAuditor(TagAuditor): """Azure tag compliance auditor.""" def __init__(self, required_tags: List[str], subscription_id: str): super().__init__(required_tags) self.subscription_id = subscription_id def audit(self) -> List[Dict]: """Audit Azure resources for tag compliance.""" if not AZURE_AVAILABLE: print("ERROR: azure-mgmt-resource not installed. Run: pip install azure-mgmt-resource azure-identity") return [] credential = DefaultAzureCredential() client = ResourceManagementClient(credential, self.subscription_id) non_compliant = [] try: resources = client.resources.list() for resource in resources: tags = resource.tags or {} # Check which required tags are missing (case-insensitive for Azure) missing_tags = [tag for tag in self.required_tags if tag not in tags] if missing_tags: non_compliant.append({ 'Cloud': 'Azure', 'ResourceARN': resource.id, 'ResourceType': resource.type, 'MissingTags': ', '.join(missing_tags), 'ExistingTags': str(tags) }) except Exception as e: print(f"ERROR: Failed to audit Azure resources: {e}") self.non_compliant_resources = non_compliant return non_compliant class GCPLabelAuditor(TagAuditor): """GCP label compliance auditor.""" def __init__(self, required_tags: List[str], organization_id: Optional[str] = None, project_id: Optional[str] = None): # GCP labels are lowercase super().__init__([tag.lower() for tag in required_tags]) self.organization_id = organization_id self.project_id = project_id def audit(self) -> List[Dict]: """Audit GCP resources for label compliance.""" if not GCP_AVAILABLE: print("ERROR: google-cloud-asset not installed. Run: pip install google-cloud-asset") return [] client = asset_v1.AssetServiceClient() non_compliant = [] scope = f"organizations/{self.organization_id}" if self.organization_id else f"projects/{self.project_id}" try: # Search for resources request = asset_v1.SearchAllResourcesRequest( scope=scope, asset_types=[ "compute.googleapis.com/Instance", "storage.googleapis.com/Bucket", ] ) page_result = client.search_all_resources(request=request) for resource in page_result: labels = resource.labels or {} # Check which required labels are missing missing_labels = [label for label in self.required_tags if label not in labels] if missing_labels: non_compliant.append({ 'Cloud': 'GCP', 'ResourceARN': resource.name, 'ResourceType': resource.asset_type, 'MissingTags': ', '.join(missing_labels), 'ExistingTags': str(labels) }) except Exception as e: print(f"ERROR: Failed to audit GCP resources: {e}") self.non_compliant_resources = non_compliant return non_compliant def generate_csv_report(resources: List[Dict], output_file: str): """Generate CSV compliance report.""" with open(output_file, 'w', newline='') as csvfile: fieldnames = ['Cloud', 'ResourceARN', 'ResourceType', 'MissingTags', 'ExistingTags'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for resource in resources: writer.writerow(resource) print(f"\nCompliance report generated: {output_file}") def print_summary(auditors: List[TagAuditor]): """Print audit summary.""" print("\n" + "="*70) print(" TAG COMPLIANCE AUDIT SUMMARY") print("="*70) for auditor in auditors: cloud = auditor.__class__.__name__.replace('TagAuditor', '').replace('LabelAuditor', '') non_compliant_count = len(auditor.non_compliant_resources) print(f"\n{cloud}:") print(f" Non-compliant resources: {non_compliant_count}") if non_compliant_count > 0 and non_compliant_count <= 10: # Print sample of non-compliant resources print(f"\n Sample non-compliant resources:") table_data = [] for resource in auditor.non_compliant_resources[:10]: table_data.append([ resource['ResourceType'], resource['ResourceARN'][:60] + '...' if len(resource['ResourceARN']) > 60 else resource['ResourceARN'], resource['MissingTags'] ]) print(tabulate(table_data, headers=['Type', 'Resource', 'Missing Tags'], tablefmt='simple')) print("\n" + "="*70) def main(): parser = argparse.ArgumentParser( description='Multi-cloud tag compliance auditor', formatter_class=argparse.RawDescriptionHelpFormatter, epilog=""" Examples: # Audit AWS only python audit_tags.py --cloud aws # Audit all clouds python audit_tags.py --cloud all # Export to CSV python audit_tags.py --cloud aws --output aws_audit.csv # Custom required tags python audit_tags.py --cloud aws --tags Environment Owner Project """ ) parser.add_argument('--cloud', choices=['aws', 'azure', 'gcp', 'all'], default='all', help='Cloud provider to audit (default: all)') parser.add_argument('--tags', nargs='+', default=DEFAULT_REQUIRED_TAGS, help=f'Required tags to check (default: {" ".join(DEFAULT_REQUIRED_TAGS)})') parser.add_argument('--output', type=str, help='Output CSV file path') parser.add_argument('--azure-subscription', type=str, help='Azure subscription ID (required for Azure audit)') parser.add_argument('--gcp-org', type=str, help='GCP organization ID (format: 123456789)') parser.add_argument('--gcp-project', type=str, help='GCP project ID (alternative to --gcp-org)') args = parser.parse_args() print("="*70) print(" MULTI-CLOUD TAG COMPLIANCE AUDIT") print("="*70) print(f"Required tags: {', '.join(args.tags)}") print(f"Clouds: {args.cloud}") print() auditors = [] all_non_compliant = [] # AWS audit if args.cloud in ['aws', 'all']: if not AWS_AVAILABLE: print("WARNING: boto3 not installed. Skipping AWS audit.") else: print("Auditing AWS resources...") aws_auditor = AWSTagAuditor(args.tags) non_compliant = aws_auditor.audit() auditors.append(aws_auditor) all_non_compliant.extend(non_compliant) print(f" Found {len(non_compliant)} non-compliant AWS resources") # Azure audit if args.cloud in ['azure', 'all']: if not AZURE_AVAILABLE: print("WARNING: azure-mgmt-resource not installed. Skipping Azure audit.") elif not args.azure_subscription: print("WARNING: --azure-subscription required for Azure audit. Skipping.") else: print("Auditing Azure resources...") azure_auditor = AzureTagAuditor(args.tags, args.azure_subscription) non_compliant = azure_auditor.audit() auditors.append(azure_auditor) all_non_compliant.extend(non_compliant) print(f" Found {len(non_compliant)} non-compliant Azure resources") # GCP audit if args.cloud in ['gcp', 'all']: if not GCP_AVAILABLE: print("WARNING: google-cloud-asset not installed. Skipping GCP audit.") elif not args.gcp_org and not args.gcp_project: print("WARNING: --gcp-org or --gcp-project required for GCP audit. Skipping.") else: print("Auditing GCP resources...") gcp_auditor = GCPLabelAuditor(args.tags, args.gcp_org, args.gcp_project) non_compliant = gcp_auditor.audit() auditors.append(gcp_auditor) all_non_compliant.extend(non_compliant) print(f" Found {len(non_compliant)} non-compliant GCP resources") # Generate report if args.output and all_non_compliant: generate_csv_report(all_non_compliant, args.output) # Print summary if auditors: print_summary(auditors) else: print("\nERROR: No cloud providers were audited. Check dependencies and arguments.") sys.exit(1) # Exit code based on compliance if all_non_compliant: sys.exit(1) # Non-zero exit for CI/CD pipelines else: print("\n✓ All resources are compliant!") sys.exit(0) if __name__ == '__main__': main() ``` ### references/cost-allocation.md ```markdown # Cost Allocation with Resource Tags Complete guide to enabling cost allocation, showback/chargeback, and budget management using cloud resource tags. ## Table of Contents 1. [Cost Allocation Overview](#cost-allocation-overview) 2. [AWS Cost Allocation](#aws-cost-allocation) 3. [Azure Cost Management](#azure-cost-management) 4. [GCP Cloud Billing](#gcp-cloud-billing) 5. [Kubernetes Cost Allocation](#kubernetes-cost-allocation) 6. [Multi-Cloud Cost Visibility](#multi-cloud-cost-visibility) 7. [Showback and Chargeback](#showback-and-chargeback) --- ## Cost Allocation Overview Resource tagging enables precise cost allocation, reducing unallocated cloud spend from 35% to <5%. ``` ┌────────────────────────────────────────────────────────┐ │ Cloud Cost Allocation Hierarchy │ ├────────────────────────────────────────────────────────┤ │ │ │ WITHOUT TAGS: │ │ └── Total Cloud Spend: $500,000/month │ │ ├── Allocated: $325,000 (65%) │ │ └── Unallocated: $175,000 (35%) ← Lost visibility│ │ │ │ WITH COMPREHENSIVE TAGGING: │ │ └── Total Cloud Spend: $500,000/month │ │ ├── By Project: │ │ │ ├── ecommerce: $200,000 (40%) │ │ │ ├── mobile-app: $150,000 (30%) │ │ │ └── analytics: $125,000 (25%) │ │ ├── By Environment: │ │ │ ├── prod: $350,000 (70%) │ │ │ ├── staging: $75,000 (15%) │ │ │ └── dev: $50,000 (10%) │ │ ├── By Team: │ │ │ ├── platform-team: $250,000 (50%) │ │ │ ├── data-team: $150,000 (30%) │ │ │ └── mobile-team: $75,000 (15%) │ │ └── Unallocated: $25,000 (5%) ← Minimal waste │ │ │ └────────────────────────────────────────────────────────┘ ``` **Key cost allocation tags**: | Tag | Purpose | Example | |-----|---------|---------| | **Project** | Track costs by business initiative | `ecommerce-platform` | | **Environment** | Separate prod vs. dev/test costs | `prod`, `staging`, `dev` | | **Owner** | Assign costs to responsible team | `[email protected]` | | **CostCenter** | Allocate to finance cost center | `CC-1234` | | **Application** | Multi-app cost breakdown | `payment-api`, `user-service` | | **Component** | Tier-level costs (web, db, cache) | `database`, `web`, `api` | --- ## AWS Cost Allocation AWS Cost Explorer and Cost Allocation Tags provide detailed cost breakdowns. ### Step 1: Enable Cost Allocation Tags **Cost allocation tags must be activated** (takes up to 24 hours for activation). #### Via AWS Console 1. Navigate to AWS Billing → Cost Allocation Tags 2. Select tags to activate (Environment, Owner, CostCenter, Project) 3. Click "Activate" 4. Wait 24 hours for data to appear in Cost Explorer #### Via Terraform ```hcl # Enable cost allocation tags resource "aws_ce_cost_allocation_tag" "environment" { tag_key = "Environment" status = "Active" } resource "aws_ce_cost_allocation_tag" "owner" { tag_key = "Owner" status = "Active" } resource "aws_ce_cost_allocation_tag" "costcenter" { tag_key = "CostCenter" status = "Active" } resource "aws_ce_cost_allocation_tag" "project" { tag_key = "Project" status = "Active" } resource "aws_ce_cost_allocation_tag" "application" { tag_key = "Application" status = "Active" } resource "aws_ce_cost_allocation_tag" "component" { tag_key = "Component" status = "Active" } ``` **Cost**: No additional charge for cost allocation tags --- ### Step 2: Create Cost Allocation Reports Query costs by tags in AWS Cost Explorer. #### AWS CLI: Cost by Project Tag ```bash # Get costs grouped by Project tag (last 30 days) aws ce get-cost-and-usage \ --time-period Start=2025-11-01,End=2025-12-01 \ --granularity MONTHLY \ --metrics "UnblendedCost" \ --group-by Type=TAG,Key=Project \ --output json | jq '.ResultsByTime[].Groups[] | {Project: .Keys[0], Cost: .Metrics.UnblendedCost.Amount}' ``` #### Terraform: Automated Cost Report ```hcl # S3 bucket for cost reports resource "aws_s3_bucket" "cost_reports" { bucket = "company-cost-allocation-reports" } resource "aws_s3_bucket_versioning" "cost_reports" { bucket = aws_s3_bucket.cost_reports.id versioning_configuration { status = "Enabled" } } # Cost and Usage Report with tag breakdowns resource "aws_cur_report_definition" "cost_allocation" { report_name = "cost-allocation-report" time_unit = "DAILY" format = "Parquet" compression = "Parquet" additional_schema_elements = ["RESOURCES"] s3_bucket = aws_s3_bucket.cost_reports.id s3_region = "us-east-1" s3_prefix = "cost-reports" additional_artifacts = ["ATHENA"] # Include tag columns in report report_versioning = "OVERWRITE_REPORT" } ``` **Athena Query Example** (query Parquet cost reports): ```sql -- Cost by Project tag (last 30 days) SELECT line_item_usage_account_id, resource_tags_user_project AS project, resource_tags_user_environment AS environment, SUM(line_item_unblended_cost) AS total_cost FROM cost_and_usage_report WHERE year = '2025' AND month = '12' AND resource_tags_user_project IS NOT NULL GROUP BY line_item_usage_account_id, resource_tags_user_project, resource_tags_user_environment ORDER BY total_cost DESC ``` --- ### Step 3: Cost Anomaly Detection by Tag Detect unusual spending patterns for specific tags. ```hcl # Cost anomaly monitor for Project tag resource "aws_ce_anomaly_monitor" "project_monitor" { name = "project-cost-monitor" monitor_type = "DIMENSIONAL" monitor_dimension = "TAG" monitor_specification = jsonencode({ Tags = { Key = "Project" Values = ["ecommerce", "mobile-app", "analytics"] } }) } # Alert when anomaly detected (cost spike >$100) resource "aws_ce_anomaly_subscription" "project_alerts" { name = "project-cost-alerts" frequency = "DAILY" monitor_arn_list = [ aws_ce_anomaly_monitor.project_monitor.arn ] subscriber { type = "EMAIL" address = "[email protected]" } threshold_expression { dimension { key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE" values = ["100"] match_options = ["GREATER_THAN_OR_EQUAL"] } } } # Monitor by Environment tag resource "aws_ce_anomaly_monitor" "environment_monitor" { name = "environment-cost-monitor" monitor_type = "DIMENSIONAL" monitor_dimension = "TAG" monitor_specification = jsonencode({ Tags = { Key = "Environment" Values = ["prod", "staging", "dev"] } }) } ``` **Cost**: $0.01 per anomaly detection monitored per day --- ### Step 4: Budget Alerts by Tag Create budgets for specific tag values (e.g., per-project budgets). ```hcl # Budget for ecommerce project resource "aws_budgets_budget" "ecommerce_budget" { name = "ecommerce-monthly-budget" budget_type = "COST" limit_amount = "50000" limit_unit = "USD" time_period_start = "2025-01-01_00:00" time_unit = "MONTHLY" cost_filters = { TagKeyValue = "Project$ecommerce" } notification { comparison_operator = "GREATER_THAN" threshold = 80 threshold_type = "PERCENTAGE" notification_type = "ACTUAL" subscriber_email_addresses = ["[email protected]"] } notification { comparison_operator = "GREATER_THAN" threshold = 100 threshold_type = "PERCENTAGE" notification_type = "FORECASTED" subscriber_email_addresses = ["[email protected]", "[email protected]"] } } # Budget per environment resource "aws_budgets_budget" "dev_budget" { name = "dev-environment-budget" budget_type = "COST" limit_amount = "5000" limit_unit = "USD" time_period_start = "2025-01-01_00:00" time_unit = "MONTHLY" cost_filters = { TagKeyValue = "Environment$dev" } notification { comparison_operator = "GREATER_THAN" threshold = 90 threshold_type = "PERCENTAGE" notification_type = "ACTUAL" subscriber_email_addresses = ["[email protected]"] } } ``` --- ## Azure Cost Management Azure Cost Management provides cost analysis and budget management by tags. ### Step 1: Cost Analysis by Tags #### Azure Portal 1. Navigate to Cost Management + Billing → Cost Analysis 2. Click "Group by" → Select "Tag: Environment" (or other tag) 3. View cost breakdown by tag value #### Azure CLI ```bash # Export costs grouped by tags az consumption usage list \ --start-date 2025-11-01 \ --end-date 2025-12-01 \ --query "[].{Date:usageStart, Cost:pretaxCost, Project:tags.Project, Environment:tags.Environment, Owner:tags.Owner}" \ --output table # Cost summary by Project tag az consumption usage list \ --start-date 2025-11-01 \ --end-date 2025-12-01 \ --query "group_by([], tags.Project, sum(pretaxCost))" \ --output json ``` --- ### Step 2: Cost Management Query (REST API) ```bash # Cost breakdown by tags via REST API POST https://management.azure.com/subscriptions/{subscription-id}/providers/Microsoft.CostManagement/query?api-version=2021-10-01 { "type": "ActualCost", "timeframe": "MonthToDate", "dataset": { "granularity": "Daily", "aggregation": { "totalCost": { "name": "Cost", "function": "Sum" } }, "grouping": [ { "type": "TagKey", "name": "Project" }, { "type": "TagKey", "name": "Environment" } ] } } ``` #### Terraform: Cost Management Export ```hcl # Storage account for cost exports resource "azurerm_storage_account" "cost_exports" { name = "companycostexports" resource_group_name = azurerm_resource_group.main.name location = "eastus" account_tier = "Standard" account_replication_type = "LRS" } resource "azurerm_storage_container" "cost_data" { name = "cost-data" storage_account_name = azurerm_storage_account.cost_exports.name container_access_type = "private" } # Cost export with tag columns resource "azurerm_cost_management_export_resource_group" "monthly_export" { name = "monthly-cost-export" resource_group_id = azurerm_resource_group.main.id recurrence_type = "Monthly" recurrence_period_start = "2025-01-01T00:00:00Z" recurrence_period_end = "2026-12-31T23:59:59Z" delivery_info { storage_account_id = azurerm_storage_account.cost_exports.id container_name = azurerm_storage_container.cost_data.name root_folder_path = "cost-exports" } query { type = "ActualCost" time_frame = "MonthToDate" dataset { granularity = "Daily" grouping { type = "TagKey" name = "Project" } grouping { type = "TagKey" name = "Environment" } } } } ``` --- ### Step 3: Budget Alerts by Tags ```hcl # Budget for specific project tag resource "azurerm_consumption_budget_resource_group" "ecommerce_budget" { name = "ecommerce-project-budget" resource_group_id = azurerm_resource_group.main.id amount = 50000 time_grain = "Monthly" time_period { start_date = "2025-01-01T00:00:00Z" end_date = "2026-12-31T23:59:59Z" } filter { tag { name = "Project" values = ["ecommerce"] } } notification { enabled = true threshold = 80.0 operator = "GreaterThan" contact_emails = [ "[email protected]", ] } notification { enabled = true threshold = 100.0 operator = "GreaterThan" contact_emails = [ "[email protected]", "[email protected]", ] } } ``` --- ## GCP Cloud Billing GCP exports billing data to BigQuery with label breakdowns. ### Step 1: Enable Billing Export to BigQuery #### GCP Console 1. Navigate to Billing → Billing export 2. Enable "BigQuery export" 3. Select dataset for export (creates daily billing tables) #### Terraform ```hcl # BigQuery dataset for billing export resource "google_bigquery_dataset" "billing_export" { dataset_id = "billing_export" location = "US" description = "GCP billing data export" labels = { environment = "prod" managedby = "terraform" } } # Billing export (configured via gcloud, not Terraform) # gcloud beta billing accounts export billing-data \ # --billing-account=BILLING_ACCOUNT_ID \ # --dataset-id=PROJECT_ID:billing_export ``` --- ### Step 2: Query Costs by Labels ```sql -- Cost breakdown by label (last 30 days) SELECT labels.key AS label_key, labels.value AS label_value, SUM(cost) AS total_cost, SUM(usage.amount) AS total_usage, usage.unit FROM `project.billing_export.gcp_billing_export_v1_XXXXX` CROSS JOIN UNNEST(labels) AS labels WHERE _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND labels.key IN ('environment', 'project', 'costcenter', 'owner') GROUP BY label_key, label_value, usage.unit ORDER BY total_cost DESC ``` ```sql -- Cost by project label (monthly trend) SELECT EXTRACT(MONTH FROM usage_start_time) AS month, labels.value AS project, SUM(cost) AS monthly_cost FROM `project.billing_export.gcp_billing_export_v1_XXXXX` CROSS JOIN UNNEST(labels) AS labels WHERE labels.key = 'project' AND usage_start_time >= TIMESTAMP('2025-01-01') GROUP BY month, project ORDER BY month, monthly_cost DESC ``` ```sql -- Untagged resources (missing required labels) SELECT service.description AS service, sku.description AS resource_type, SUM(cost) AS unallocated_cost FROM `project.billing_export.gcp_billing_export_v1_XXXXX` WHERE _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND ( ARRAY_LENGTH(labels) = 0 -- No labels at all OR NOT EXISTS ( SELECT 1 FROM UNNEST(labels) WHERE key IN ('environment', 'project', 'owner') ) ) GROUP BY service, resource_type ORDER BY unallocated_cost DESC LIMIT 20 ``` --- ### Step 3: Budget Alerts by Labels ```hcl # Budget for specific project label resource "google_billing_budget" "ecommerce_budget" { billing_account = var.billing_account_id display_name = "Ecommerce Project Budget" budget_filter { projects = ["projects/${var.project_id}"] labels = { project = "ecommerce" } } amount { specified_amount { currency_code = "USD" units = "50000" } } threshold_rules { threshold_percent = 0.8 } threshold_rules { threshold_percent = 1.0 } threshold_rules { threshold_percent = 1.2 spend_basis = "FORECASTED_SPEND" } all_updates_rule { monitoring_notification_channels = [ google_monitoring_notification_channel.email.name ] } } resource "google_monitoring_notification_channel" "email" { display_name = "FinOps Team Email" type = "email" labels = { email_address = "[email protected]" } } ``` --- ## Kubernetes Cost Allocation Track Kubernetes costs by namespace labels using Kubecost or OpenCost. ### Kubecost Label-Based Allocation **Install Kubecost**: ```bash helm repo add kubecost https://kubecost.github.io/cost-analyzer/ helm install kubecost kubecost/cost-analyzer \ --namespace kubecost --create-namespace \ --set kubecostToken="your-token" ``` **Query costs by label via API**: ```bash # Cost by environment label (last 7 days) curl "http://kubecost.company.com/model/allocation \ ?window=7d \ &aggregate=label:environment \ &accumulate=true" # Cost by project label curl "http://kubecost.company.com/model/allocation \ ?window=month \ &aggregate=label:project" ``` **Kubecost allocation configuration**: ```yaml # values.yaml for Kubecost Helm chart kubecostModel: allocationLabels: - environment - project - owner - costcenter - app ``` --- ### OpenCost (Open Source Alternative) ```bash # Install OpenCost kubectl apply -f https://raw.githubusercontent.com/opencost/opencost/develop/kubernetes/opencost.yaml # Port-forward to access UI kubectl port-forward -n opencost service/opencost 9003:9003 # Query costs by label curl "http://localhost:9003/allocation \ ?window=7d \ &aggregate=label:environment" ``` --- ## Multi-Cloud Cost Visibility Aggregate costs across AWS, Azure, GCP using multi-cloud cost management tools. ### CloudHealth by VMware **Features**: - Multi-cloud cost aggregation (AWS, Azure, GCP) - Tag-based showback/chargeback - Cost anomaly detection - Budget management **Integration**: Connect AWS, Azure, GCP billing accounts via IAM roles/service principals --- ### Apptio Cloudability **Features**: - Unified cost dashboard (AWS, Azure, GCP, Kubernetes) - Tag normalization (standardize tags across clouds) - Showback reports by tag - Commitment optimization (RIs, Savings Plans) --- ### Custom Multi-Cloud Cost Dashboard Aggregate billing exports from all clouds into single data warehouse. ```sql -- Unified cost view (AWS + Azure + GCP) CREATE VIEW unified_cloud_costs AS SELECT 'AWS' AS cloud_provider, line_item_usage_account_id AS account_id, resource_tags_user_project AS project, resource_tags_user_environment AS environment, line_item_unblended_cost AS cost, line_item_usage_start_date AS usage_date FROM aws_cost_and_usage_report WHERE resource_tags_user_project IS NOT NULL UNION ALL SELECT 'Azure' AS cloud_provider, subscription_id AS account_id, tags['Project'] AS project, tags['Environment'] AS environment, cost AS cost, date AS usage_date FROM azure_cost_export WHERE tags['Project'] IS NOT NULL UNION ALL SELECT 'GCP' AS cloud_provider, project.id AS account_id, labels.value AS project, env_labels.value AS environment, cost AS cost, usage_start_time AS usage_date FROM gcp_billing_export CROSS JOIN UNNEST(labels) AS labels CROSS JOIN UNNEST(labels) AS env_labels WHERE labels.key = 'project' AND env_labels.key = 'environment' ``` --- ## Showback and Chargeback ### Showback (Informational Cost Reporting) **Purpose**: Show teams their cloud costs without billing them **Use case**: Dev/test environments, internal transparency **Monthly showback report example**: | Team | Project | Environment | Monthly Cost | YoY Change | |------|---------|-------------|--------------|------------| | Platform Team | ecommerce | prod | $25,000 | +15% | | Platform Team | ecommerce | staging | $3,000 | +5% | | Data Team | analytics | prod | $18,000 | +30% | | Mobile Team | mobile-app | prod | $12,000 | -10% | **Implementation**: Automated email report generated from cost allocation queries --- ### Chargeback (Actual Cost Billing) **Purpose**: Bill internal teams for their actual cloud usage **Use case**: Multi-tenant SaaS, shared services billing **Chargeback implementation**: ```python # chargeback_report.py import boto3 from datetime import datetime, timedelta def generate_chargeback_report(start_date, end_date): ce_client = boto3.client('ce') response = ce_client.get_cost_and_usage( TimePeriod={ 'Start': start_date, 'End': end_date }, Granularity='MONTHLY', Metrics=['UnblendedCost'], GroupBy=[ {'Type': 'TAG', 'Key': 'CostCenter'}, {'Type': 'TAG', 'Key': 'Project'} ] ) chargeback = {} for result in response['ResultsByTime']: for group in result['Groups']: cost_center = group['Keys'][0].split('$')[1] project = group['Keys'][1].split('$')[1] cost = float(group['Metrics']['UnblendedCost']['Amount']) if cost_center not in chargeback: chargeback[cost_center] = {} chargeback[cost_center][project] = cost return chargeback # Export to finance system (CSV) def export_chargeback_csv(chargeback_data, output_file): with open(output_file, 'w') as f: f.write('CostCenter,Project,Amount\n') for cost_center, projects in chargeback_data.items(): for project, amount in projects.items(): f.write(f'{cost_center},{project},{amount:.2f}\n') if __name__ == '__main__': last_month_start = (datetime.now().replace(day=1) - timedelta(days=1)).replace(day=1).strftime('%Y-%m-%d') last_month_end = datetime.now().replace(day=1).strftime('%Y-%m-%d') chargeback = generate_chargeback_report(last_month_start, last_month_end) export_chargeback_csv(chargeback, f'chargeback_{last_month_start}.csv') ``` --- ## Best Practices Summary 1. **Activate cost allocation tags** in billing console (AWS, Azure, GCP) 2. **Wait 24 hours** for cost allocation data to populate (AWS) 3. **Create budgets by tag** to prevent cost overruns per project/team 4. **Set up anomaly detection** for unusual spending patterns 5. **Export billing data** to data warehouse for custom analysis 6. **Automate showback reports** (monthly email to teams with their costs) 7. **Track unallocated spend** (resources without required tags = wasted visibility) 8. **Use Kubecost/OpenCost** for Kubernetes cost allocation by namespace/label 9. **Normalize tags** across clouds for unified multi-cloud cost reporting 10. **Integrate with finance systems** for automated chargeback billing ```