Back to skills
SkillHub ClubDesign ProductFull StackDesignerSecurity

architecting-security

Design comprehensive security architectures using defense-in-depth, zero trust principles, threat modeling (STRIDE, PASTA), and control frameworks (NIST CSF, CIS Controls, ISO 27001). Use when designing security for new systems, auditing existing architectures, or establishing security governance programs.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
318
Hot score
99
Updated
March 20, 2026
Overall rating
C4.1
Composite score
4.1
Best-practice grade
B70.0

Install command

npx @skill-hub/cli install ancoleman-ai-design-components-architecting-security

Repository

ancoleman/ai-design-components

Skill path: skills/architecting-security

Design comprehensive security architectures using defense-in-depth, zero trust principles, threat modeling (STRIDE, PASTA), and control frameworks (NIST CSF, CIS Controls, ISO 27001). Use when designing security for new systems, auditing existing architectures, or establishing security governance programs.

Open repository

Best for

Primary workflow: Design Product.

Technical facets: Full Stack, Designer, Security.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: ancoleman.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install architecting-security into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/ancoleman/ai-design-components before adding architecting-security to shared team environments
  • Use architecting-security for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: architecting-security
description: Design comprehensive security architectures using defense-in-depth, zero trust principles, threat modeling (STRIDE, PASTA), and control frameworks (NIST CSF, CIS Controls, ISO 27001). Use when designing security for new systems, auditing existing architectures, or establishing security governance programs.
---

# Security Architecture

Design and implement comprehensive security architectures that protect systems, data, and users through layered defense strategies, zero trust principles, and risk-based security controls.

## Purpose

Security architecture provides the strategic foundation for building resilient, compliant, and trustworthy systems. This skill guides the design of defense-in-depth layers, zero trust implementations, threat modeling methodologies, and mapping to control frameworks (NIST CSF, CIS Controls, ISO 27001).

Unlike tactical security skills (configuring firewalls, implementing authentication, scanning vulnerabilities), security architecture focuses on strategic planning, comprehensive defense strategies, and governance frameworks.

## When to Use This Skill

Use security architecture when:

- Designing security for greenfield systems (new applications, cloud migrations)
- Conducting security audits or risk assessments of existing systems
- Implementing zero trust architecture across enterprise environments
- Establishing security governance programs and compliance frameworks
- Threat modeling applications, APIs, or microservices architectures
- Selecting and mapping security controls to regulatory requirements (SOC 2, HIPAA, PCI DSS)
- Designing cloud security architectures (AWS, GCP, Azure multi-account strategies)
- Addressing supply chain security (SLSA framework, SBOM implementation)

## Core Security Architecture Principles

### 1. Defense in Depth

Implement multiple independent layers of security controls so that if one layer fails, others continue to protect critical assets.

**9 Defense Layers (2025 Model):**

1. **Physical Security:** Data center access, environmental controls, hardware security modules (HSMs)
2. **Network Perimeter:** Next-gen firewalls (NGFW), DDoS protection, web application firewalls (WAF)
3. **Network Segmentation:** VLANs, VPCs, security groups, micro-segmentation
4. **Endpoint Protection:** EDR, antivirus, device encryption, patch management
5. **Application Layer:** Secure coding, WAF, API security, SAST/DAST scanning
6. **Data Layer:** Encryption (at-rest, in-transit, in-use), DLP, backup/recovery
7. **Identity & Access Management:** MFA, SSO, RBAC/ABAC, privileged access management (PAM)
8. **Behavioral Analytics:** UEBA, ML-based anomaly detection, threat intelligence
9. **Security Operations:** SIEM, SOAR, incident response, continuous monitoring

**Key Principle:** Each layer provides independent protection. Failure of one layer does not compromise the entire system.

For detailed layer-by-layer implementation patterns, see `references/defense-in-depth.md`.

### 2. Zero Trust Architecture

Implement "never trust, always verify" principles where every access request is authenticated, authorized, and continuously validated.

**Core Zero Trust Principles:**

1. **Continuous Verification:** Authenticate and authorize every access request (no implicit trust)
2. **Least Privilege Access:** Grant minimal permissions required, use just-in-time (JIT) access
3. **Assume Breach:** Design systems expecting compromise, limit blast radius
4. **Explicit Verification:** Verify user identity (MFA), device health, application integrity, context (location, time, behavior)
5. **Micro-Segmentation:** Divide networks into small isolated zones, control east-west traffic

**Zero Trust Architecture Components:**

- **Policy Engine:** Centralized authorization decision point (allow/deny)
- **Identity Provider (IdP):** User/machine identity verification (Azure AD, Okta)
- **Device Posture Service:** Device health checks (MDM, EDR integration)
- **Context/Risk Engine:** Behavioral analytics, location, time, threat intelligence
- **Policy Enforcement Points:** Gateways enforcing decisions (ZTNA, API gateways)

For zero trust implementation roadmap and reference architecture, see `references/zero-trust-architecture.md`.

### 3. Threat Modeling

Systematically identify, prioritize, and mitigate security threats through structured methodologies.

**Primary Methodologies:**

| Methodology | Purpose | Complexity | Best For |
|-------------|---------|------------|----------|
| **STRIDE** | Threat identification | Low | Development teams, quick threat analysis |
| **PASTA** | Risk-centric analysis | High | Enterprise risk management |
| **DREAD** | Risk scoring | Low | Prioritizing existing threats |
| **Attack Trees** | Visual threat analysis | Medium | Security architecture reviews |

**STRIDE Threat Categories:**

- **S**poofing: Attacker impersonates another user/system (Mitigation: MFA, certificate validation)
- **T**ampering: Unauthorized data modification (Mitigation: Encryption, digital signatures)
- **R**epudiation: User denies action without proof (Mitigation: Audit logs, non-repudiation)
- **I**nformation Disclosure: Confidential data exposure (Mitigation: Encryption, access controls, DLP)
- **D**enial of Service: System unavailability (Mitigation: Rate limiting, DDoS protection, redundancy)
- **E**levation of Privilege: Gaining higher privileges (Mitigation: Least privilege, input validation, patching)

**STRIDE Application Process:**

1. Model the system using data flow diagrams (DFDs)
2. Identify threats by applying STRIDE to each component/data flow
3. Document threats with STRIDE categories
4. Prioritize threats using DREAD scoring or business impact
5. Design mitigation controls

For detailed threat modeling methodologies, PASTA process, DREAD scoring, and attack trees, see `references/threat-modeling.md`. For threat modeling examples, see `examples/threat-models/`.

## Security Control Frameworks

Map security controls to industry frameworks to ensure comprehensive coverage and compliance.

### NIST Cybersecurity Framework (CSF) 2.0

**6 Core Functions:**

1. **GOVERN (GV):** Risk management strategy, policies, supply chain risk management
2. **IDENTIFY (ID):** Asset inventory, risk assessment, continuous improvement
3. **PROTECT (PR):** Access control, data security, platform security, infrastructure resilience
4. **DETECT (DE):** Continuous monitoring, anomaly detection, security event analysis
5. **RESPOND (RS):** Incident management, analysis, communication, mitigation
6. **RECOVER (RC):** Recovery planning, execution, post-incident improvement

**Usage:** Map security controls to NIST CSF categories to ensure coverage of all security functions. Provides risk-based, flexible framework for security programs.

For detailed NIST CSF category mapping and subcategories, see `references/nist-csf-mapping.md`.

### CIS Critical Security Controls v8

**18 Controls organized in 3 Implementation Groups:**

- **IG1 (Basic):** 56 safeguards for small organizations (asset inventory, access control, logging, backups)
- **IG2 (Intermediate):** +74 safeguards for mid-sized organizations with IT security staff
- **IG3 (Advanced):** +23 safeguards for large enterprises with dedicated security teams

**Top Priority Controls (IG1):**
1. Inventory and Control of Enterprise Assets
2. Inventory and Control of Software Assets
3. Data Protection
4. Secure Configuration of Enterprise Assets
5. Account Management
6. Access Control Management
7. Continuous Vulnerability Management
8. Audit Log Management

**Usage:** CIS Controls provide prescriptive, measurable security baseline. Start with IG1, progress to IG2/IG3 as security maturity increases.

For detailed CIS Controls implementation guidance, see `references/cis-controls.md`.

### OWASP Top 10 Risk Mitigation

Map OWASP Top 10 application security risks to architectural controls:

| OWASP Risk | Primary Control | Framework Mapping |
|------------|-----------------|-------------------|
| **Injection** | Parameterized queries, input validation | NIST PR.DS, CIS 16 |
| **Broken Authentication** | MFA, secure session management | NIST PR.AC, CIS 5, 6 |
| **Sensitive Data Exposure** | Encryption, key management | NIST PR.DS, CIS 3 |
| **XXE** | Disable external entities, use JSON | NIST PR.DS, CIS 16 |
| **Broken Access Control** | Authorization checks, RBAC | NIST PR.AC, CIS 6 |
| **Security Misconfiguration** | Hardening, minimal configs | NIST PR.IP, CIS 4 |
| **XSS** | Output encoding, CSP | NIST PR.DS, CIS 16 |
| **Insecure Deserialization** | Validate objects, safe formats | NIST PR.DS, CIS 16 |
| **Known Vulnerabilities** | Patch management, SBOM | NIST ID.RA, CIS 7 |
| **Logging & Monitoring** | SIEM, centralized logging | NIST DE.CM, CIS 8 |

For detailed OWASP Top 10 mitigation strategies and code examples, see `references/owasp-top10-mitigation.md`.

## Architecture Selection Decision Framework

Select appropriate security architecture approach based on system characteristics:

**Greenfield (New System):**
- Implement Zero Trust from Day 1
- Identity-first architecture (MFA, SSO, RBAC/ABAC)
- Micro-segmentation by default
- Assume breach mentality (limit blast radius)
- Continuous verification and monitoring

**Brownfield (Existing System):**
- Hybrid: Maintain Defense in Depth + Zero Trust overlay
- Keep existing perimeter controls (firewalls, VPN)
- Layer Zero Trust controls progressively
- Segment critical assets first (data, admin access)
- Modernize identity and access management

**Compliance-Driven:**
- Map to control frameworks based on requirements:
  - **General Security:** NIST CSF for risk-based approach
  - **Baseline Hardening:** CIS Controls for prescriptive guidance
  - **Comprehensive ISMS:** ISO 27001 for certification
  - **Industry-Specific:** PCI DSS (payments), HIPAA Security Rule (healthcare), FedRAMP (government)

**Cloud-Native:**
- Use cloud provider reference architectures:
  - **AWS:** Well-Architected Framework (Security Pillar)
  - **GCP:** Security Best Practices, Security Command Center
  - **Azure:** Security Benchmark, Defender for Cloud
- Implement cloud-native security services (CSPM, CWPP)

**Hybrid/Multi-Cloud:**
- Cloud Security Posture Management (CSPM) for unified policy enforcement
- Cross-cloud visibility and monitoring
- Cloud-agnostic IAM (Okta, Azure AD)

For detailed architecture selection decision trees, see `references/defense-in-depth.md` and `references/zero-trust-architecture.md`.

## Supply Chain Security

Protect software supply chain from tampering, backdoors, and compromised dependencies.

### SLSA Framework

**Supply-chain Levels for Software Artifacts (4 levels):**

1. **SLSA Level 1 - Provenance:** Build process generates provenance metadata (not tamper-proof)
2. **SLSA Level 2 - Hosted Build:** Build on trusted platform (GitHub Actions, Cloud Build)
3. **SLSA Level 3 - Hardened Build:** Build platform prevents tampering, audit logs
4. **SLSA Level 4 - Hermetic, Reproducible:** Fully hermetic builds, reproducible, two-party review

**Implementation:** Start with Level 1 provenance generation, progress to Level 2 (GitHub Actions), then Level 3 (hardened CI/CD with audit logs).

### SBOM (Software Bill of Materials)

Generate and maintain inventory of software components and dependencies.

**SBOM Standards:**
- **CycloneDX:** OWASP standard (JSON/XML format)
- **SPDX:** Linux Foundation standard
- **SWID:** ISO/IEC 19770-2 standard

**SBOM Use Cases:**
- Vulnerability Management: Quickly identify affected components during CVE disclosures
- License Compliance: Track open-source licenses for legal compliance
- Supply Chain Risk: Visibility into third-party code and dependencies
- Incident Response: Rapid assessment of Log4Shell-type incidents

**Dependency Management Best Practices:**
1. Generate SBOM automatically in CI/CD pipeline
2. Continuous scanning with tools (Dependabot, Snyk, Trivy, Grype)
3. Automated security patch updates
4. License compliance tracking and approval workflows
5. Pin dependency versions using lock files
6. Minimize dependencies to reduce attack surface

For SLSA implementation guide, SBOM generation examples, and dependency scanning automation, see `references/supply-chain-security.md`.

## Cloud Security Architecture Patterns

### AWS Security Architecture

**Well-Architected Framework - Security Pillar Principles:**

1. **Strong identity foundation:** Centralize IAM, least privilege, IAM Identity Center (SSO)
2. **Enable traceability:** CloudTrail, GuardDuty, Security Hub for comprehensive logging
3. **Apply security at all layers:** Defense in depth across VPC, instances, applications, data
4. **Automate security best practices:** Infrastructure as Code (Terraform, CloudFormation)
5. **Protect data in transit and at rest:** TLS 1.3, AWS KMS, encryption everywhere

**Key AWS Security Services:**

- **IAM:** AWS IAM, IAM Identity Center (SSO), Cognito (customer identity)
- **Detection:** GuardDuty (threat detection), Security Hub (centralized findings), Detective (investigation)
- **Network:** AWS WAF, Shield (DDoS), Network Firewall
- **Data:** KMS (key management), Secrets Manager, Macie (data classification)
- **Compute:** Systems Manager (patch management), Inspector (vulnerability scanning)

**Multi-Account Strategy:** Use AWS Organizations with Security OU (Security Account, Logging Account, Audit Account) and Workload OUs (Production, Non-Production). Apply Service Control Policies (SCPs) for guardrails.

For AWS reference architectures and multi-account security setup, see `references/aws-security-architecture.md` and `examples/architectures/aws-multi-account-security.md`.

### GCP Security Architecture

**Key GCP Security Services:**

- **IAM:** Cloud IAM, Identity Platform (customer identity), Cloud Identity (workforce)
- **Detection:** Security Command Center (unified dashboard), Chronicle (SIEM), Event Threat Detection
- **Network:** Cloud Armor (DDoS/WAF), VPC Service Controls (data exfiltration prevention), Cloud Firewall
- **Data:** Cloud KMS, Secret Manager, Cloud DLP (data loss prevention)
- **Compute:** Binary Authorization (image signing), Confidential Computing (encryption in use)

**Organization Hierarchy:** Structure with Organization → Folders (Production, Non-Production, Security) → Projects. Apply IAM policies at folder level for inheritance.

For GCP security architecture patterns and organization setup, see `references/gcp-security-architecture.md` and `examples/architectures/gcp-security-hierarchy.md`.

### Azure Security Architecture

**Key Azure Security Services:**

- **IAM:** Azure AD (Entra ID), Privileged Identity Management (JIT access), Conditional Access
- **Detection:** Microsoft Defender for Cloud (CSPM/CWPP), Sentinel (SIEM/SOAR), Azure Monitor
- **Network:** Azure Firewall, Front Door + WAF, DDoS Protection
- **Data:** Key Vault (secrets, keys, certificates), Information Protection (DLP), Storage encryption
- **Compute:** Just-in-Time VM Access, Azure Policy (compliance enforcement)

**Hub-Spoke Landing Zone:** Implement hub VNet (shared services: firewall, VPN, Azure Bastion) with spoke VNets (workloads). Use Management Groups for policy hierarchy.

For Azure security architecture and hub-spoke design, see `references/azure-security-architecture.md` and `examples/architectures/azure-landing-zone.md`.

## Identity & Access Management Patterns

### Authentication Controls

**Multi-Factor Authentication (MFA):**
- **Types:** TOTP (time-based one-time passwords), push notifications, biometrics, hardware tokens (YubiKey, FIDO2)
- **Enforcement:** Require MFA for all users (workforce and customers), especially privileged accounts
- **Passwordless:** Transition to WebAuthn, FIDO2, passkeys to eliminate password-based attacks

**Single Sign-On (SSO):**
- **Protocols:** SAML 2.0, OAuth 2.0, OpenID Connect (OIDC)
- **Benefits:** Centralized authentication, reduced password fatigue, improved security posture
- **Implementation:** Azure AD, Okta, Auth0, Ping Identity

### Authorization Controls

**Role-Based Access Control (RBAC):**
- Users assigned to roles, roles have permissions
- Coarse-grained, simple to implement
- Best for: Organizations with stable role structures

**Attribute-Based Access Control (ABAC):**
- Fine-grained access based on attributes (user department, resource classification, time, location)
- More flexible than RBAC
- Best for: Complex, dynamic access requirements

**Policy-Based Access Control (PBAC):**
- Centralized policy engines (Open Policy Agent - OPA, AWS Cedar)
- Policies defined declaratively and versioned
- Best for: Microservices, API gateways, cloud-native architectures

### Privileged Access Management (PAM)

**Just-in-Time (JIT) Access:**
- Temporary elevated privileges for specific tasks
- Time-bound access grants (e.g., 4 hours)
- Reduces standing privileged access

**Credential Vaulting:**
- Centralized storage of privileged credentials (CyberArk, HashiCorp Vault, Azure Key Vault)
- Automatic password rotation
- Session recording and auditing

For detailed IAM implementation patterns, MFA configuration, and PAM setup, see `references/iam-patterns.md`.

## Security Monitoring & Operations

### SIEM (Security Information & Event Management)

Centralize log aggregation, correlation, and alerting for security events.

**Leading SIEM Platforms:**
- Splunk, Elastic Security, Microsoft Sentinel, Chronicle

**SIEM Architecture:**
1. **Log Collection:** Ingest logs from all layers (network, endpoints, applications, cloud)
2. **Normalization:** Standardize log formats for correlation
3. **Correlation:** Apply rules to detect patterns (failed logins → brute force attack)
4. **Alerting:** Notify SOC team of high-priority events
5. **Investigation:** Provide search and visualization for incident analysis

### SOAR (Security Orchestration, Automation & Response)

Automate incident response workflows to reduce mean time to respond (MTTR).

**SOAR Capabilities:**
- **Playbooks:** Automated response workflows (block IP, quarantine endpoint, revoke credentials)
- **Orchestration:** Integrate with security tools (SIEM, EDR, firewall, IAM)
- **Case Management:** Track incidents, assign to analysts, document resolution

**Leading SOAR Platforms:**
- Splunk SOAR, Palo Alto Cortex XSOAR, IBM Resilient

### Detection Strategies

**UEBA (User & Entity Behavior Analytics):**
- Machine learning-based anomaly detection
- Detects: Account compromise, insider threats, data exfiltration
- Baseline normal behavior, alert on deviations

**Threat Intelligence:**
- Integrate threat feeds (MISP, ThreatConnect, ISACs)
- Enrich alerts with threat context (known malicious IPs, IOCs)
- Proactive threat hunting using TTPs (MITRE ATT&CK framework)

For SIEM architecture, SOAR playbook examples, and detection strategies, see `references/security-operations.md`.

## Quick Reference: Control Framework Mapping

Use this table to map risks to appropriate control frameworks:

| Risk/Requirement | Framework | Key Controls |
|------------------|-----------|--------------|
| General security program | NIST CSF 2.0 | All 6 functions (GV, ID, PR, DE, RS, RC) |
| Compliance baseline | CIS Controls v8 | IG1: Controls 1-18 (56 safeguards) |
| ISO certification | ISO 27001/27002 | 114 controls across 14 domains |
| Application security | OWASP ASVS | 286 security requirements (3 levels) |
| Cloud security (AWS) | AWS Well-Architected | Security Pillar: 10 design principles |
| Cloud security (GCP) | GCP Security Best Practices | Security Command Center architecture |
| Cloud security (Azure) | Azure Security Benchmark | Defender for Cloud controls |
| Supply chain security | SLSA + SBOM | Level 2+ SLSA, CycloneDX SBOM |
| Zero trust architecture | NIST SP 800-207 | ZTA tenets, deployment models |
| Privacy/GDPR | NIST Privacy Framework | Privacy engineering objectives |

## Integration with Related Skills

Security architecture provides the strategic foundation for tactical security implementations:

- **`infrastructure-as-code`:** Implement security architecture as code (secure defaults, hardening)
- **`kubernetes-operations`:** Apply K8s security architecture (RBAC, Pod Security, Network Policies)
- **`secret-management`:** Architect secrets management (KMS, Vault, rotation strategies)
- **`building-ci-pipelines`:** Secure CI/CD architecture (SAST/DAST integration, artifact signing)
- **`configuring-firewalls`:** Implement network perimeter layer of defense-in-depth
- **`vulnerability-management`:** Integrate vulnerability scanning into security architecture
- **`auth-security`:** Implement IAM layer details (MFA, RBAC/ABAC, session management)
- **`siem-logging`:** Implement security monitoring architecture (SIEM, log aggregation)
- **`compliance-frameworks`:** Map security architecture to compliance requirements

## Common Security Architecture Patterns

### Pattern 1: Zero Trust Network Access (ZTNA)

Replace VPN with identity-based access to applications.

**Architecture:**
1. User authenticates to identity provider (Azure AD, Okta)
2. Device posture check validates device health
3. Policy engine evaluates access request (user, device, context)
4. Access granted through secure connector (no network access)

**Benefits:** Eliminates lateral movement, reduces attack surface, improves user experience

### Pattern 2: Defense in Depth for Web Applications

Layer multiple security controls for web application protection.

**Layers:**
1. DDoS Protection (Cloudflare, AWS Shield)
2. WAF (application firewall, OWASP Top 10 rules)
3. API Gateway (authentication, rate limiting)
4. Application Security (SAST/DAST, secure coding)
5. Database Security (encryption, least privilege)
6. Logging & Monitoring (SIEM, anomaly detection)

### Pattern 3: Cloud Security Posture Management (CSPM)

Continuously monitor and enforce security configurations across cloud environments.

**Architecture:**
1. Asset Discovery: Inventory all cloud resources
2. Configuration Assessment: Compare against security baselines (CIS Benchmarks)
3. Compliance Monitoring: Track regulatory compliance (SOC 2, ISO 27001)
4. Remediation: Automated fixes or guided workflows
5. Drift Detection: Alert on configuration changes

**Leading CSPM Tools:** Wiz, Orca Security, Prisma Cloud, Microsoft Defender for Cloud

## Resources and References

**Defense in Depth:**
- `references/defense-in-depth.md` - 9-layer defense model, implementation patterns, failure impact analysis

**Zero Trust Architecture:**
- `references/zero-trust-architecture.md` - ZTA principles, reference architecture, implementation roadmap

**Threat Modeling:**
- `references/threat-modeling.md` - STRIDE, PASTA, DREAD, Attack Trees methodologies
- `examples/threat-models/web-app-stride.md` - Web application STRIDE analysis example
- `examples/threat-models/api-threat-model.md` - REST API threat model example
- `examples/threat-models/microservices-threat-model.md` - Microservices threat model example

**Control Frameworks:**
- `references/nist-csf-mapping.md` - NIST CSF 2.0 functions, categories, subcategories
- `references/cis-controls.md` - CIS Controls v8, implementation groups, safeguards
- `references/owasp-top10-mitigation.md` - OWASP Top 10 risks and mitigation strategies

**Supply Chain Security:**
- `references/supply-chain-security.md` - SLSA framework, SBOM generation, dependency scanning

**Cloud Security:**
- `references/aws-security-architecture.md` - AWS Well-Architected Security Pillar, services, patterns
- `references/gcp-security-architecture.md` - GCP Security Best Practices, services, organization design
- `references/azure-security-architecture.md` - Azure Security Benchmark, Defender for Cloud, landing zones

**IAM & Operations:**
- `references/iam-patterns.md` - Authentication, authorization, MFA, RBAC/ABAC, PAM
- `references/security-operations.md` - SIEM, SOAR, UEBA, threat intelligence, incident response

**Architecture Examples:**
- `examples/architectures/aws-multi-account-security.md` - AWS Organizations security setup
- `examples/architectures/gcp-security-hierarchy.md` - GCP folder/project security hierarchy
- `examples/architectures/azure-landing-zone.md` - Azure hub-spoke landing zone
- `examples/architectures/zero-trust-network.md` - Zero trust network design

**Scripts:**
- `scripts/threat-model-template.py` - Generate STRIDE threat model templates
- `scripts/control-gap-analysis.sh` - Compare current controls against frameworks
- `scripts/sbom-generate.sh` - Generate SBOM in CycloneDX format
- `scripts/security-checklist.sh` - Automated security architecture checklist

## Summary

Security architecture requires strategic planning across multiple layers, from physical security to security operations. Implement defense-in-depth for comprehensive protection, adopt zero trust principles for modern cloud environments, use threat modeling to identify risks proactively, and map controls to frameworks for compliance and completeness.

Start with risk assessment to understand threats, select appropriate architecture approach (zero trust for greenfield, hybrid for brownfield), implement layered controls, and continuously monitor and improve security posture.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/defense-in-depth.md

```markdown
# Defense in Depth Reference

## Table of Contents

1. [Overview](#overview)
2. [9-Layer Defense Model](#9-layer-defense-model)
3. [Implementation Patterns](#implementation-patterns)
4. [Layer Integration Strategies](#layer-integration-strategies)
5. [Failure Impact Analysis](#failure-impact-analysis)
6. [Architecture Selection Decision Tree](#architecture-selection-decision-tree)

## Overview

Defense in Depth (DiD) is the foundational security architecture principle of implementing multiple independent layers of security controls. If one layer fails or is breached, other layers continue to protect critical assets.

**Core Principle:** Redundancy and diversity of defense mechanisms across all layers from physical to operational.

**Modern Context (2025):** Defense in Depth now incorporates behavioral analytics, workload security, and identity threat detection to address cloud-native architectures and sophisticated attack vectors.

## 9-Layer Defense Model

### Layer 1: Physical Security

**Purpose:** Protect physical access to hardware and facilities.

**Controls:**
- Physical access control systems (badge readers, biometrics, mantrap entries)
- Surveillance systems (CCTV, motion detection, security guards)
- Environmental controls (HVAC, fire suppression, power redundancy, UPS)
- Hardware security modules (HSM) for cryptographic key storage
- Secure disposal of hardware (degaussing magnetic media, physical destruction of drives)

**Cloud Considerations:**
- Cloud providers (AWS, GCP, Azure) handle physical security of data centers
- SOC 2 Type II and ISO 27001 certified facilities
- Shared responsibility: Provider secures data centers, customer secures workloads

**Failure Impact:** Limited (cloud providers provide strong physical security)

**Monitoring:** Access logs, video surveillance, environmental sensors

---

### Layer 2: Network Perimeter

**Purpose:** Control and inspect traffic entering and leaving the network.

**Controls:**
- **Next-Generation Firewalls (NGFW):** Application-aware firewalls (Palo Alto, Fortinet, Cisco)
- **Web Application Firewall (WAF):** OWASP Top 10 protection (Cloudflare, Imperva, F5, AWS WAF)
- **DDoS Protection:** Volumetric attack mitigation (Cloudflare, AWS Shield, Azure DDoS)
- **Intrusion Prevention System (IPS/IDS):** Signature and anomaly-based detection
- **Deep Packet Inspection (DPI):** Traffic analysis and content filtering
- **SSL/TLS Inspection:** Decrypt and inspect encrypted traffic for threats

**Architecture Pattern:**
```
Internet → DDoS Protection → WAF/CDN → NGFW → Internal Network
```

**Failure Impact:** High (direct exposure to internet threats if perimeter fails)

**Monitoring:** Firewall logs, IDS/IPS alerts, traffic flow analysis, DDoS metrics

---

### Layer 3: Network Segmentation

**Purpose:** Divide network into isolated zones to limit lateral movement.

**Controls:**
- **VLANs:** Virtual LAN segmentation (Layer 2)
- **VPCs/Subnets:** Cloud virtual networks (AWS VPC, GCP VPC, Azure VNet)
- **Security Groups:** Stateful firewall rules for cloud instances
- **Network Access Control Lists (NACLs):** Stateless firewall rules for subnets
- **Micro-Segmentation:** Fine-grained segmentation at workload level (service mesh, zero trust)
- **Internal Firewalls:** East-west traffic control between zones

**Segmentation Zones:**
- **DMZ (Demilitarized Zone):** Public-facing services (web servers, mail servers)
- **Web Tier:** Application front-end
- **Application Tier:** Business logic, APIs
- **Data Tier:** Databases, data stores (most restricted)
- **Management Zone:** Administrative access, jump boxes

**Architecture Pattern:**
```
┌─────────────┐
│   Internet  │
└──────┬──────┘
       │
┌──────▼──────────────────────────┐
│  DMZ Zone (Public Web Servers)  │
└──────┬──────────────────────────┘
       │
┌──────▼──────────────────────────┐
│  Web Tier (App Front-End)       │
└──────┬──────────────────────────┘
       │
┌──────▼──────────────────────────┐
│  App Tier (Business Logic)      │
└──────┬──────────────────────────┘
       │
┌──────▼──────────────────────────┐
│  Data Tier (Databases)           │  ← Most Restricted
└──────────────────────────────────┘
```

**Failure Impact:** Medium (limits lateral movement even if one zone is compromised)

**Monitoring:** VPC flow logs, network traffic analysis, connection attempts between zones

---

### Layer 4: Endpoint Protection

**Purpose:** Secure individual devices (workstations, servers, mobile devices).

**Controls:**
- **Antivirus/Anti-malware:** Signature-based and heuristic detection (Windows Defender, CrowdStrike, SentinelOne)
- **Endpoint Detection & Response (EDR):** Real-time monitoring, threat hunting, automated response
- **Mobile Device Management (MDM):** BYOD policies, remote wipe, app control (Intune, Jamf, VMware Workspace ONE)
- **Patch Management:** Automated OS and application patching (WSUS, SCCM, Systems Manager)
- **Device Encryption:** Full-disk encryption (BitLocker, FileVault, LUKS)
- **Device Posture Validation:** Health checks before network access (compliance checks, OS version, encryption status)
- **Host-based Firewall:** Windows Firewall, iptables/nftables (Linux)

**Zero Trust Integration:**
- Continuous device posture assessment
- Device compliance verification before access grant
- Adaptive access based on device trust score

**Failure Impact:** High (endpoint compromise is a common attack vector)

**Monitoring:** EDR alerts, patch compliance dashboards, device health status, antivirus detections

---

### Layer 5: Application Security

**Purpose:** Protect applications, APIs, and software from exploitation.

**Controls:**
- **Secure SDLC:** Security requirements, threat modeling, security testing in development
- **Web Application Firewall (WAF):** OWASP Top 10 protection at application layer
- **API Security:** Authentication (OAuth 2.0, API keys), authorization, rate limiting, input validation
- **Code Analysis:**
  - **SAST (Static Application Security Testing):** Scan source code (SonarQube, Checkmarx)
  - **DAST (Dynamic Application Security Testing):** Scan running applications (OWASP ZAP, Burp Suite)
  - **IAST (Interactive Application Security Testing):** Runtime analysis (Contrast Security)
  - **SCA (Software Composition Analysis):** Dependency scanning (Snyk, Dependabot, Trivy)
- **Runtime Protection:** RASP (Runtime Application Self-Protection)
- **Dependency Management:** SBOM generation, vulnerability scanning, automated updates
- **Input Validation:** Sanitize and validate all user inputs
- **Output Encoding:** Prevent XSS attacks through proper encoding

**OWASP Top 10 Controls:**
1. Injection → Parameterized queries, ORM frameworks
2. Broken Authentication → MFA, secure session management
3. Sensitive Data Exposure → Encryption, key management
4. XXE → Disable XML external entities, prefer JSON
5. Broken Access Control → Authorization checks at every endpoint
6. Security Misconfiguration → Hardening guides, remove defaults
7. XSS → Output encoding, Content Security Policy (CSP)
8. Insecure Deserialization → Validate serialized objects
9. Known Vulnerabilities → Patch management, SBOM
10. Insufficient Logging → Centralized logging, SIEM

**Failure Impact:** High (application vulnerabilities are easily exploitable)

**Monitoring:** WAF logs, application logs, DAST findings, dependency vulnerability alerts

---

### Layer 6: Data Security

**Purpose:** Protect data confidentiality, integrity, and availability throughout its lifecycle.

**Controls:**
- **Encryption at Rest:** AES-256, Transparent Database Encryption (TDE)
- **Encryption in Transit:** TLS 1.3, mutual TLS (mTLS), VPN tunnels
- **Encryption in Use:** Confidential computing (Intel SGX, AMD SEV, ARM TrustZone)
- **Key Management:** HSM (hardware security modules), cloud KMS (AWS KMS, GCP KMS, Azure Key Vault)
- **Data Classification:** Public, Internal, Confidential, Restricted
- **Data Loss Prevention (DLP):** Content inspection, policy enforcement, alerting (Microsoft Purview, Symantec DLP)
- **Database Security:** Least privilege access, audit logging, query monitoring
- **Backup & Recovery:** 3-2-1 rule (3 copies, 2 media types, 1 offsite), immutable backups
- **Data Masking:** Anonymization, pseudonymization for non-production environments

**Data Lifecycle Security:**
```
Create → Store → Process → Share → Archive → Destroy
  │       │        │         │        │         │
  ▼       ▼        ▼         ▼        ▼         ▼
Classify Encrypt  Access   Encrypt  Retention Secure
         +Key Mgmt Control  +Audit   Policy   Deletion
```

**Failure Impact:** Critical (data is the ultimate target of attacks)

**Monitoring:** Data access logs, DLP alerts, encryption status, backup verification, unauthorized access attempts

---

### Layer 7: Identity & Access Management (IAM)

**Purpose:** Control who and what can access resources.

**Authentication Controls:**
- **Multi-Factor Authentication (MFA):** TOTP, push notifications, biometrics, hardware tokens (YubiKey, FIDO2)
- **Passwordless:** WebAuthn, FIDO2, passkeys
- **Single Sign-On (SSO):** SAML 2.0, OAuth 2.0, OpenID Connect (Azure AD, Okta, Auth0)

**Authorization Controls:**
- **Role-Based Access Control (RBAC):** Users → Roles → Permissions
- **Attribute-Based Access Control (ABAC):** Fine-grained based on attributes (department, location, time)
- **Policy-Based Access Control (PBAC):** Centralized policy engines (Open Policy Agent, AWS Cedar)

**Privileged Access Management (PAM):**
- **Just-in-Time (JIT) Access:** Temporary elevated privileges
- **Session Recording:** Audit all privileged sessions
- **Credential Vaulting:** CyberArk, HashiCorp Vault, AWS Secrets Manager

**Identity Governance:**
- User lifecycle management (joiner/mover/leaver)
- Access certification and recertification
- Segregation of Duties (SoD) enforcement

**Identity-First Zero Trust:**
- Identity is the control plane
- Every access request authenticates identity
- Risk-based adaptive authentication (device, location, behavior)

**Failure Impact:** Critical (identity compromise grants full access)

**Monitoring:** Authentication logs, failed login attempts, privileged access, MFA enrollment, risky sign-ins

---

### Layer 8: Behavioral Analytics

**Purpose:** Detect anomalies and threats through machine learning and behavioral analysis.

**Controls:**
- **User & Entity Behavior Analytics (UEBA):** ML-based anomaly detection (Microsoft Sentinel, Splunk)
- **Anomaly Detection:** Detect deviations from normal behavior patterns
- **Threat Intelligence:** Integrate feeds from ISACs, threat intel platforms (MISP, ThreatConnect)
- **Risk Scoring:** Assign risk scores to users, entities, activities
- **Contextual Analysis:** Combine multiple signals (time, location, device, resource accessed)

**Detection Use Cases:**
- Account compromise (unusual login location, time, device)
- Insider threats (unusual data access, exfiltration patterns)
- Lateral movement (unusual network connections)
- Privilege escalation (unusual administrative actions)
- Data exfiltration (large data transfers, unusual file access)

**Failure Impact:** Medium (detection layer, not prevention; delays in detection increase breach impact)

**Monitoring:** UEBA alerts, risk score changes, anomaly detection dashboards

---

### Layer 9: Security Operations

**Purpose:** Continuous monitoring, detection, response, and improvement.

**Controls:**
- **Security Information & Event Management (SIEM):** Centralized log aggregation, correlation, alerting (Splunk, Elastic, Sentinel, Chronicle)
- **Security Orchestration, Automation & Response (SOAR):** Playbook automation, incident response (Splunk SOAR, Cortex XSOAR)
- **Extended Detection & Response (XDR):** Unified visibility across endpoints, network, cloud (Palo Alto Cortex XDR, Trend Micro Vision One)
- **Vulnerability Management:** Continuous scanning, risk-based prioritization (Tenable, Qualys, Rapid7)
- **Penetration Testing:** Red team exercises, bug bounty programs
- **Incident Response:** Documented playbooks, tabletop exercises, post-incident reviews
- **Threat Hunting:** Proactive search for threats using MITRE ATT&CK TTPs

**SIEM Architecture:**
```
Data Sources → Log Collection → Normalization → Correlation → Alerting → Investigation
(Firewalls,    (Agents, APIs)   (Common format) (Rules, ML)   (SOC Team) (Search, Viz)
 Endpoints,
 Apps, Cloud)
```

**Failure Impact:** High (inability to detect or respond to breaches extends dwell time)

**Monitoring:** SIEM alert metrics, MTTD (mean time to detect), MTTR (mean time to respond), incident counts

---

## Implementation Patterns

### Pattern 1: Cloud-Native Defense in Depth (AWS Example)

**Layer Mapping:**

1. **Physical:** AWS data center security (managed by AWS)
2. **Network Perimeter:** AWS WAF + CloudFront + Shield (DDoS)
3. **Network Segmentation:** VPCs, subnets, security groups, NACLs
4. **Endpoint:** Systems Manager (patch), Inspector (vuln scan), GuardDuty (threat detection)
5. **Application:** API Gateway (rate limit, auth), WAF rules, CodeGuru (code analysis)
6. **Data:** S3 encryption, RDS encryption, KMS (key management), Macie (data discovery)
7. **IAM:** IAM Identity Center (SSO), MFA, IAM policies (least privilege), IAM Access Analyzer
8. **Behavioral Analytics:** GuardDuty (anomaly detection), Detective (investigation)
9. **Security Operations:** Security Hub (centralized findings), CloudWatch (monitoring), CloudTrail (audit logs)

**Implementation Steps:**
1. Design VPC architecture with public/private subnets
2. Enable GuardDuty, Security Hub, CloudTrail organization-wide
3. Implement IAM Identity Center for SSO and MFA
4. Deploy WAF on CloudFront and API Gateway
5. Enable encryption for all data stores (S3, RDS, EBS)
6. Configure Security Hub automated response (Lambda)
7. Centralize logs in dedicated Logging Account

---

### Pattern 2: Zero Trust Overlay on Defense in Depth

**Strategy:** Maintain existing Defense in Depth perimeter controls, layer Zero Trust controls progressively.

**Phase 1: Identity Foundation**
- Implement SSO and MFA for all users
- Deploy identity provider (Azure AD, Okta)
- Enable device posture checks

**Phase 2: Least Privilege Access**
- Migrate to RBAC/ABAC
- Implement JIT access for privileged accounts
- Remove standing admin permissions

**Phase 3: Micro-Segmentation**
- Segment critical applications
- Deploy ZTNA (Zero Trust Network Access) for remote access
- Implement service mesh for east-west traffic control

**Phase 4: Continuous Verification**
- Deploy UEBA for anomaly detection
- Implement risk-based conditional access
- Enable continuous compliance monitoring

**Result:** Hybrid architecture with perimeter defense + identity-first zero trust controls

---

### Pattern 3: Defense in Depth for Web Applications

**Layered Controls:**

```
Layer 9: SIEM + SOAR (Splunk, Sentinel)
         ↓
Layer 8: UEBA (Anomaly detection)
         ↓
Layer 7: OAuth 2.0 + MFA + RBAC
         ↓
Layer 6: Database encryption + DLP
         ↓
Layer 5: WAF + SAST/DAST + Secure coding
         ↓
Layer 4: Container security (runtime protection)
         ↓
Layer 3: Kubernetes Network Policies + Service mesh
         ↓
Layer 2: Cloud WAF + DDoS protection
         ↓
Layer 1: Cloud provider data center security
```

**Example Tech Stack:**
- **Layer 2:** Cloudflare (WAF + DDoS)
- **Layer 3:** Kubernetes Network Policies + Istio service mesh
- **Layer 4:** Falco (runtime security) + Trivy (image scanning)
- **Layer 5:** OWASP ZAP (DAST) + SonarQube (SAST) + Snyk (SCA)
- **Layer 6:** PostgreSQL with TDE + AWS KMS
- **Layer 7:** Auth0 (OAuth 2.0 + MFA) + OPA (policy-based authz)
- **Layer 8:** Elastic Security (UEBA)
- **Layer 9:** Elastic SIEM + TheHive (SOAR)

---

## Layer Integration Strategies

### Strategy 1: Centralized Logging

Aggregate logs from all layers into SIEM for unified visibility.

**Log Sources:**
- Layer 2: Firewall logs, IDS/IPS alerts
- Layer 3: VPC flow logs, network connection logs
- Layer 4: Endpoint logs, EDR alerts
- Layer 5: Application logs, WAF logs
- Layer 6: Database audit logs, data access logs
- Layer 7: Authentication logs, IAM events
- Layer 8: UEBA alerts, anomaly scores
- Layer 9: Incident response actions

**SIEM Correlation Rules:**
- Multiple failed logins (Layer 7) + Endpoint compromise (Layer 4) = Account takeover
- Unusual data access (Layer 6) + Large data transfer (Layer 3) = Data exfiltration
- Malware detection (Layer 4) + Lateral movement (Layer 3) = Active breach

---

### Strategy 2: Policy Enforcement Across Layers

Define security policies centrally and enforce at multiple layers.

**Example Policy: "Only encrypted data in transit"**
- Layer 2: WAF blocks HTTP requests (enforce HTTPS)
- Layer 3: Security group rules block port 80 (HTTP)
- Layer 5: Application redirects HTTP to HTTPS
- Layer 6: Database rejects non-TLS connections
- Layer 9: SIEM alerts on any HTTP traffic detected

**Enforcement Points:**
- Network layer (firewalls, security groups)
- Application layer (WAF, API gateway)
- Data layer (database configuration)
- Monitoring layer (SIEM alerts)

---

### Strategy 3: Defense Redundancy

Implement multiple defenses for critical assets.

**Example: Database Protection**
- Layer 2: Firewall blocks external access to database ports
- Layer 3: Database in private subnet, no internet gateway
- Layer 4: Database server hardened (minimal services, patched)
- Layer 5: Application uses parameterized queries (prevent SQL injection)
- Layer 6: Database encryption at rest + TLS in transit
- Layer 7: Database access requires authentication + least privilege
- Layer 8: UEBA detects unusual database queries
- Layer 9: Database audit logs sent to SIEM

**Result:** Even if one layer fails (e.g., SQL injection bypasses Layer 5), other layers still protect the database.

---

## Failure Impact Analysis

### Critical Failures (Immediate Risk)

**Layer 7 (IAM) Failure:**
- **Impact:** Compromised credentials grant full access
- **Mitigation:** MFA prevents credential-only compromise, PAM limits privilege duration
- **Detection:** Monitor failed logins, unusual access patterns

**Layer 6 (Data) Failure:**
- **Impact:** Data breach, regulatory violations, reputational damage
- **Mitigation:** Encryption limits exposure (encrypted data less valuable), DLP prevents exfiltration
- **Detection:** Monitor data access patterns, large transfers

---

### High Failures (Significant Risk)

**Layer 2 (Perimeter) Failure:**
- **Impact:** Direct exposure to internet threats
- **Mitigation:** Layer 3 segmentation limits lateral movement, Layer 7 IAM prevents unauthorized access
- **Detection:** IDS/IPS alerts, unusual inbound traffic

**Layer 4 (Endpoint) Failure:**
- **Impact:** Malware execution, lateral movement platform
- **Mitigation:** Layer 3 micro-segmentation limits spread, Layer 7 prevents privilege escalation
- **Detection:** EDR alerts, anomalous process execution

---

### Medium Failures (Moderate Risk)

**Layer 3 (Segmentation) Failure:**
- **Impact:** Lateral movement possible
- **Mitigation:** Layer 7 IAM limits access even within same network, Layer 8 detects lateral movement
- **Detection:** Flow logs, connection attempts to unusual destinations

**Layer 8 (Behavioral Analytics) Failure:**
- **Impact:** Delayed threat detection
- **Mitigation:** Layer 9 SIEM still provides rule-based detection, Layer 4/5/6 still prevent many attacks
- **Detection:** Increased undetected dwell time

---

## Architecture Selection Decision Tree

```
START: Designing security architecture
  │
  ├─► Is this a greenfield (new) system?
  │     YES ──► Zero Trust from Day 1
  │               ├─► Identity-first architecture (Layer 7 primary)
  │               ├─► Micro-segmentation by default (Layer 3)
  │               ├─► Assume breach mentality
  │               └─► Continuous verification (Layer 8)
  │
  │     NO ──► Brownfield (existing) system?
  │               └─► Hybrid: Defense in Depth + Zero Trust overlay
  │                     ├─► Maintain perimeter controls (Layers 1-3)
  │                     ├─► Strengthen IAM (Layer 7: SSO, MFA, RBAC)
  │                     ├─► Add behavioral analytics (Layer 8)
  │                     └─► Segment critical assets first
  │
  ├─► What is the deployment environment?
  │     CLOUD-NATIVE ──► Cloud provider reference architectures
  │                       ├─► AWS: Well-Architected Security Pillar
  │                       ├─► GCP: Security Best Practices
  │                       └─► Azure: Security Benchmark
  │
  │     HYBRID CLOUD ──► Multi-cloud security posture management
  │                       ├─► Unified policy enforcement (CSPM)
  │                       ├─► Cross-cloud visibility
  │                       └─► Cloud-agnostic IAM (Okta, Azure AD)
  │
  │     ON-PREMISES ──► Traditional Defense in Depth
  │                       ├─► Strong perimeter (Layer 2)
  │                       ├─► Network segmentation (Layer 3)
  │                       └─► Progressive modernization to Zero Trust
  │
  ├─► What are compliance requirements?
  │     SOC 2 ──► NIST CSF + CIS Controls baseline
  │     HIPAA ──► NIST CSF + HIPAA Security Rule mappings
  │     PCI DSS ──► PCI DSS requirements + network segmentation
  │     GDPR ──► NIST Privacy Framework + data protection controls
  │     ISO 27001 ──► ISO 27001/27002 control framework
  │
  └─► What is the risk tolerance?
        HIGH RISK (Financial, Healthcare, Government)
          ├─► Maximum controls across all 9 layers
          ├─► Zero Trust + Defense in Depth
          ├─► 24/7 SOC monitoring
          └─► Penetration testing, red team exercises

        MEDIUM RISK (Enterprise SaaS, E-commerce)
          ├─► Balanced security and usability
          ├─► Cloud-native security services
          └─► Managed SIEM/SOC

        LOW RISK (Internal tools, non-sensitive data)
          ├─► Essential controls (Layers 2, 5, 7, 9)
          ├─► Cloud-native security defaults
          └─► Automated monitoring
```

---

## Summary

Defense in Depth provides comprehensive security through layered, independent controls. Implement all 9 layers for maximum protection, with critical focus on:

1. **Layer 7 (IAM):** Identity is the new perimeter - strongest authentication and authorization
2. **Layer 6 (Data):** Data is the ultimate target - encrypt and protect at all stages
3. **Layer 9 (Security Operations):** Detection and response capabilities determine breach impact

For new systems, design Zero Trust architecture on top of Defense in Depth foundation. For existing systems, progressively add Zero Trust controls while maintaining perimeter defenses.

```

### references/zero-trust-architecture.md

```markdown
# Zero Trust Architecture Reference

## Table of Contents

1. [Overview](#overview)
2. [Core Principles](#core-principles)
3. [Reference Architecture](#reference-architecture)
4. [Implementation Roadmap](#implementation-roadmap)
5. [Technology Components](#technology-components)
6. [Common Patterns](#common-patterns)

## Overview

Zero Trust Architecture (ZTA) implements the principle "never trust, always verify" where every access request is authenticated, authorized, and continuously validated regardless of network location.

**Key Shift:** Traditional security assumes trust inside the network perimeter. Zero Trust assumes breach and verifies explicitly at every access request.

**Primary Standard:** NIST Special Publication 800-207 - Zero Trust Architecture

## Core Principles

### 1. Never Trust, Always Verify

**Traditional Model:**
- Trust based on network location (inside corporate network = trusted)
- VPN grants broad network access

**Zero Trust Model:**
- No implicit trust based on location
- Every access request authenticated and authorized
- Continuous verification throughout session

**Implementation:**
- Authenticate users with MFA at every access request
- Validate device posture before granting access
- Re-authenticate periodically during session
- Monitor session for anomalies

---

### 2. Assume Breach

**Design Philosophy:**
- Assume attackers are already inside the network
- Limit blast radius of any compromise
- Detect and contain breaches quickly

**Implementation:**
- Micro-segmentation: Isolate workloads to prevent lateral movement
- Least privilege: Minimize permissions granted
- Continuous monitoring: Detect anomalous behavior
- Automated response: Contain threats immediately

---

### 3. Explicit Verification

Verify multiple signals before granting access:

**User Identity:**
- Multi-factor authentication (MFA)
- Biometric verification
- Risk-based authentication

**Device Health:**
- Operating system version and patch level
- Endpoint protection status (EDR running, up-to-date)
- Encryption status (disk encryption enabled)
- Compliance with security policies

**Application Integrity:**
- Code signing verification
- Runtime integrity checks
- Vulnerability status

**Context:**
- Location (geolocation, IP address)
- Time of access (business hours vs. unusual times)
- Behavior (normal vs. anomalous patterns)
- Threat intelligence (known malicious IPs, IOCs)

---

### 4. Least Privilege Access

Grant minimum permissions required to complete tasks.

**Just-in-Time (JIT) Access:**
- Elevate privileges only when needed
- Time-bound access (e.g., 4 hours)
- Automated de-provisioning after time expires

**Just-Enough-Access (JEA):**
- Minimal permissions for specific tasks
- No standing admin privileges
- Role-based or attribute-based access control

**Access Recertification:**
- Periodic review of user permissions
- Automated removal of unused permissions
- Manager attestation for critical access

---

### 5. Micro-Segmentation

Divide network into small, isolated zones with granular access controls.

**Traditional Segmentation:**
- Coarse-grained: DMZ, internal network, database tier
- Network-based: VLANs, subnets

**Micro-Segmentation:**
- Fine-grained: Per-application, per-workload isolation
- Identity-based: Access based on user/service identity, not network location
- Dynamic: Policies follow workloads (cloud-native, containers)

**Technologies:**
- Zero Trust Network Access (ZTNA)
- Service mesh (Istio, Linkerd)
- Cloud security groups with identity-based rules
- Software-defined perimeter (SDP)

---

## Reference Architecture

```
┌──────────────────────────────────────────────────────────────────┐
│                    POLICY DECISION POINT                         │
│                  (Policy Engine + Policy Administrator)          │
│                                                                  │
│  Input: User ID, Device Health, Resource, Context               │
│  Output: ALLOW / DENY access decision                           │
└──────────────────────┬───────────────────────────────────────────┘
                       │ Policy Decision
                       │
         ┌─────────────┼─────────────┐
         │             │             │
    ┌────▼────┐   ┌────▼────┐   ┌───▼────┐
    │ Identity│   │ Device  │   │Context │
    │ Provider│   │ Posture │   │ & Risk │
    │ (IdP)   │   │ Service │   │ Engine │
    │         │   │         │   │        │
    │ - Users │   │ - OS    │   │ - Geo  │
    │ - MFA   │   │ - EDR   │   │ - Time │
    │ - SSO   │   │ - Patch │   │ - UEBA │
    └─────────┘   └─────────┘   └────────┘
         │             │             │
         └─────────────┼─────────────┘
                       │ Trust Signals
         ┌─────────────▼─────────────┐
         │  POLICY ENFORCEMENT POINT │
         │    (PEP - Gateways)       │
         │                           │
         │  - ZTNA Gateway           │
         │  - API Gateway            │
         │  - Reverse Proxy          │
         └─────────────┬─────────────┘
                       │ Enforced Access
    ┌──────────────────┼──────────────────┐
    │                  │                  │
┌───▼───┐         ┌────▼────┐        ┌───▼────┐
│ User  │         │  App    │        │  Data  │
│Access │         │ Access  │        │ Access │
│(SaaS, │         │ (APIs,  │        │ (DBs,  │
│ Apps) │         │ Services)│        │ S3)    │
└───┬───┘         └────┬────┘        └───┬────┘
    │                  │                 │
┌───▼────────┐    ┌────▼─────────┐  ┌───▼─────────┐
│  End Users │    │ Applications │  │  Data Stores│
│  (Humans)  │    │ (Workloads)  │  │  (Storage)  │
└────────────┘    └──────────────┘  └─────────────┘
```

### Architecture Components

**1. Policy Decision Point (PDP):**
- **Policy Engine:** Makes access decisions (allow/deny) based on policies and trust signals
- **Policy Administrator:** Establishes/shuts down communication paths between subjects and resources

**2. Trust Signal Sources:**
- **Identity Provider (IdP):** Azure AD, Okta, Auth0, Ping Identity
  - User/service identity verification
  - MFA enforcement
  - SSO integration
- **Device Posture Service:** Microsoft Intune, Jamf, VMware Workspace ONE
  - Device inventory and health checks
  - Compliance verification
  - Integration with MDM/UEM platforms
- **Context & Risk Engine:** Microsoft Sentinel, Splunk, UEBA platforms
  - Behavioral analytics
  - Geolocation and time-based risk
  - Threat intelligence integration
  - Adaptive risk scoring

**3. Policy Enforcement Point (PEP):**
- **ZTNA Gateways:** Zscaler Private Access, Palo Alto Prisma Access, Cloudflare Access
  - Enforce policy decisions
  - Terminate connections if trust changes
  - No direct network access granted
- **API Gateways:** Kong, Apigee, AWS API Gateway
  - Enforce API-level policies
  - Rate limiting and throttling
  - JWT validation
- **Reverse Proxies:** NGINX, Envoy, Traefik
  - Application-level access control
  - TLS termination
  - Request filtering

---

## Implementation Roadmap

### Phase 1: Foundation (Months 1-3)

**Objective:** Establish identity and visibility foundation

**Tasks:**
1. **Asset Inventory:**
   - Inventory all users (employees, contractors, partners)
   - Inventory all devices (workstations, mobile, servers, IoT)
   - Inventory all applications (SaaS, on-premises, cloud)
   - Inventory all data stores and classification

2. **Identity Provider Deployment:**
   - Deploy centralized IdP (Azure AD, Okta)
   - Implement SSO for all applications
   - Enforce MFA for all users (prioritize privileged accounts)
   - Integrate on-premises AD with cloud IdP (hybrid identity)

3. **Device Management:**
   - Deploy MDM/UEM platform (Intune, Jamf, Workspace ONE)
   - Enroll all devices
   - Establish device compliance policies (encryption, patch level, EDR)
   - Enable device posture checks

4. **Visibility:**
   - Deploy SIEM for centralized logging
   - Enable cloud audit logs (CloudTrail, Azure Monitor, GCP Cloud Logging)
   - Establish baseline network traffic patterns
   - Map data flows between applications

**Success Metrics:**
- 100% user MFA enrollment
- 95%+ device enrollment in MDM
- Centralized logging operational
- Asset inventory complete

---

### Phase 2: Access Controls (Months 4-6)

**Objective:** Implement least privilege access and strong authentication

**Tasks:**
1. **Least Privilege Access:**
   - Review and document all user roles and permissions
   - Implement RBAC (role-based access control)
   - Remove excessive permissions (principle of least privilege)
   - Document and approve all privileged access

2. **Privileged Access Management (PAM):**
   - Deploy PAM solution (CyberArk, BeyondTrust, HashiCorp Vault)
   - Implement JIT (just-in-time) access for admins
   - Remove standing privileged credentials
   - Enable session recording for all privileged access

3. **Conditional Access Policies:**
   - Implement risk-based authentication
   - Require MFA for high-risk scenarios (new device, unusual location)
   - Block access from known malicious IPs
   - Enforce device compliance before granting access

4. **Identity Governance:**
   - Establish user lifecycle processes (joiner/mover/leaver)
   - Implement access certification (quarterly reviews)
   - Automate access revocation on termination
   - Segregation of duties (SoD) enforcement

**Success Metrics:**
- 0 standing admin credentials
- 100% privileged access via JIT
- Conditional access policies active
- Access certification process operational

---

### Phase 3: Micro-Segmentation (Months 7-9)

**Objective:** Limit lateral movement through network segmentation

**Tasks:**
1. **Application Dependency Mapping:**
   - Map all application dependencies and data flows
   - Document north-south traffic (user → application)
   - Document east-west traffic (application → application)
   - Identify critical assets requiring isolation

2. **Design Segmentation Zones:**
   - Define micro-segmentation zones (per-application, per-tier)
   - Create security policies for each zone
   - Plan migration sequence (critical apps first)

3. **ZTNA Deployment:**
   - Deploy ZTNA solution for remote access (replace VPN)
   - Configure application connectors/gateways
   - Migrate users from VPN to ZTNA (phased rollout)
   - Decommission VPN infrastructure

4. **Service Mesh (Cloud-Native):**
   - Deploy service mesh (Istio, Linkerd) for Kubernetes
   - Implement mutual TLS (mTLS) between services
   - Define service-to-service authorization policies
   - Monitor east-west traffic

5. **Network Policy Enforcement:**
   - Implement network policies (Kubernetes Network Policies, security groups)
   - Default deny all traffic, allow explicitly
   - Log all blocked traffic for tuning

**Success Metrics:**
- ZTNA replaces VPN for remote access
- Critical applications micro-segmented
- East-west traffic controlled by policies
- Lateral movement significantly reduced

---

### Phase 4: Monitoring & Automation (Months 10-12)

**Objective:** Continuous verification and automated response

**Tasks:**
1. **Behavioral Analytics (UEBA):**
   - Deploy UEBA platform (Microsoft Sentinel, Splunk)
   - Establish baseline behavior for users and entities
   - Configure anomaly detection rules
   - Integrate with SIEM for correlation

2. **Continuous Compliance Monitoring:**
   - Deploy CSPM for cloud security posture (Wiz, Orca, Prisma Cloud)
   - Monitor configuration drift from security baselines
   - Automate remediation of common misconfigurations
   - Track compliance against frameworks (CIS, NIST)

3. **Automated Incident Response (SOAR):**
   - Deploy SOAR platform (Splunk SOAR, Cortex XSOAR)
   - Create playbooks for common incidents:
     - Compromised credential → Revoke tokens, force re-auth
     - Malware detection → Isolate endpoint, block IOCs
     - Unusual data access → Alert SOC, increase monitoring
   - Test and refine playbooks

4. **Continuous Verification:**
   - Implement continuous device posture checks
   - Re-authenticate users periodically during long sessions
   - Adjust access based on real-time risk scores
   - Terminate sessions on trust degradation

**Success Metrics:**
- UEBA operational with baseline established
- 80%+ incidents automated response
- Mean time to respond (MTTR) reduced by 50%
- Continuous compliance monitoring active

---

## Technology Components

### Identity Providers (IdP)

| Provider | Strengths | Use Case |
|----------|-----------|----------|
| **Azure AD (Entra ID)** | Microsoft ecosystem integration, Conditional Access | Microsoft-centric organizations |
| **Okta** | Broad SaaS integration, strong MFA | Multi-cloud, SaaS-heavy |
| **Auth0** | Developer-friendly, CIAM focus | Customer identity (B2C) |
| **Ping Identity** | Enterprise scale, legacy integration | Large enterprises, hybrid |

**Key Features:**
- SSO (SAML, OAuth 2.0, OpenID Connect)
- MFA (TOTP, push, biometric, FIDO2)
- Conditional access policies
- User lifecycle management
- API access management

---

### Zero Trust Network Access (ZTNA)

| Provider | Approach | Strengths |
|----------|----------|-----------|
| **Zscaler Private Access** | Cloud-native proxy | Global PoPs, scalability |
| **Palo Alto Prisma Access** | SASE (converged ZTNA + CASB + FWaaS) | Comprehensive security |
| **Cloudflare Access** | Cloudflare network integration | Performance, global reach |
| **Perimeter 81** | Simplified deployment | SMB-friendly, easy setup |

**ZTNA vs. VPN:**

| Aspect | VPN | ZTNA |
|--------|-----|------|
| **Access Model** | Network-level (broad access) | Application-level (granular) |
| **Trust Model** | Implicit (inside = trusted) | Explicit (verify every request) |
| **Lateral Movement** | Easy (full network access) | Difficult (app-specific access) |
| **Device Posture** | Rarely checked | Continuously verified |
| **User Experience** | VPN client, latency | Transparent, faster |

---

### Micro-Segmentation Tools

**Cloud-Native:**
- **AWS Security Groups:** Stateful firewall rules for EC2 instances
- **GCP Firewall Rules:** VPC-level network policies
- **Azure Network Security Groups:** Subnet and NIC-level firewalls
- **Kubernetes Network Policies:** Pod-to-pod communication control

**Service Mesh:**
- **Istio:** Full-featured, complex, sidecar-based
- **Linkerd:** Lightweight, simple, sidecar-based
- **Consul Connect:** HashiCorp ecosystem, service registry integration

**Software-Defined Perimeter (SDP):**
- **Appgate SDP:** Enterprise SDP solution
- **Cyxtera AppGate:** Cloud and on-premises SDP
- **Google BeyondCorp:** Google's zero trust implementation

---

### UEBA & Risk Engines

| Platform | Strengths | Integration |
|----------|-----------|-------------|
| **Microsoft Sentinel** | Azure ecosystem, AI-driven | Azure AD, Microsoft 365 |
| **Splunk UEBA** | Advanced ML, customizable | Splunk SIEM |
| **Exabeam** | Automated threat detection | Multi-SIEM integration |
| **Securonix** | Big data analytics | Large-scale environments |

**UEBA Use Cases:**
- Account compromise detection (unusual login patterns)
- Insider threat detection (data exfiltration, privilege abuse)
- Lateral movement detection (unusual network connections)
- Risk scoring for adaptive authentication

---

## Common Patterns

### Pattern 1: ZTNA for Remote Workforce

**Problem:** VPN provides broad network access, enabling lateral movement if compromised.

**Solution:** Replace VPN with ZTNA for application-specific access.

**Implementation:**
1. Deploy ZTNA gateway (Zscaler, Palo Alto, Cloudflare)
2. Configure application connectors for each internal application
3. Define access policies (user roles → specific applications)
4. Enforce device posture checks before access
5. Migrate users from VPN to ZTNA (pilot → full rollout)
6. Decommission VPN

**Benefits:**
- No network-level access granted
- Per-application access control
- Device posture verification
- Improved user experience (no VPN client)

---

### Pattern 2: Zero Trust for Cloud Workloads

**Problem:** Cloud workloads communicate over network, enabling lateral movement.

**Solution:** Implement service mesh with mutual TLS and policy-based authorization.

**Implementation (Kubernetes + Istio):**
1. Deploy Istio service mesh to Kubernetes cluster
2. Enable automatic sidecar injection for all pods
3. Configure mTLS for all service-to-service communication
4. Define authorization policies using Istio AuthorizationPolicy:
   ```yaml
   apiVersion: security.istio.io/v1beta1
   kind: AuthorizationPolicy
   metadata:
     name: frontend-to-backend
   spec:
     selector:
       matchLabels:
         app: backend
     action: ALLOW
     rules:
     - from:
       - source:
           principals: ["cluster.local/ns/default/sa/frontend"]
       to:
       - operation:
           methods: ["GET", "POST"]
   ```
5. Monitor service mesh traffic with Kiali or Grafana

**Benefits:**
- Encrypted service-to-service communication
- Identity-based authorization (service accounts)
- Zero trust between microservices
- Visibility into east-west traffic

---

### Pattern 3: Adaptive Authentication Based on Risk

**Problem:** Static MFA requirements frustrate users in low-risk scenarios, but weak authentication enables breaches.

**Solution:** Adaptive authentication with risk-based MFA requirements.

**Implementation (Azure AD Conditional Access):**
1. Define risk signals:
   - High risk: New device, unusual location, known malicious IP
   - Medium risk: After-hours access, risky sign-in
   - Low risk: Known device, typical location, business hours

2. Configure Conditional Access policies:
   - **High risk:** Require MFA + compliant device + block if very high risk
   - **Medium risk:** Require MFA
   - **Low risk:** Allow access (SSO only)

3. Integrate UEBA for behavioral risk scoring

4. Continuously adjust risk scores based on session behavior

**Benefits:**
- Strong authentication when needed
- Minimal friction for low-risk access
- Dynamic security posture
- Reduced successful attacks

---

### Pattern 4: Just-in-Time (JIT) Privileged Access

**Problem:** Standing admin credentials are high-value targets and increase breach risk.

**Solution:** JIT access with time-bound privilege elevation.

**Implementation (Azure AD Privileged Identity Management):**
1. Remove all standing admin role assignments
2. Configure eligible roles (users can activate when needed)
3. Define activation requirements:
   - MFA required
   - Justification required (ticket number)
   - Approval required for critical roles
   - Time-bound (e.g., 4 hours)

4. Enable session recording for all privileged sessions

5. Alert on all privilege activations

6. Review and audit activation logs regularly

**Benefits:**
- Reduced attack surface (no standing admin creds)
- Audit trail of all privileged access
- Time-limited exposure
- Justification for compliance

---

## Benefits of Zero Trust Architecture

**Security Benefits:**
- **Reduced Attack Surface:** No broad network access, application-specific only
- **Limited Lateral Movement:** Micro-segmentation prevents attackers from spreading
- **Breach Detection:** Continuous monitoring detects anomalies quickly
- **Compliance:** Strong access controls and audit trails for regulatory requirements

**Cost Benefits (IBM 2024 Cost of Data Breach Report):**
- Average savings: $1.76M per breach for organizations with mature Zero Trust
- Reduced breach detection time (27% faster detection)
- Reduced breach containment time (33% faster containment)

**Operational Benefits:**
- **Improved User Experience:** ZTNA eliminates VPN latency and client issues
- **Cloud-Native:** Aligns with cloud and container architectures
- **Automation:** Policy-based access reduces manual administration
- **Visibility:** Comprehensive logging and monitoring across all access

---

## Challenges and Mitigations

### Challenge 1: Complexity of Implementation

**Issue:** Zero Trust requires integrating multiple technologies (IdP, ZTNA, UEBA, SIEM, PAM).

**Mitigation:**
- Phased approach (12-month roadmap, not "big bang")
- Start with identity foundation (Phase 1)
- Use cloud-native solutions where possible (reduce on-premises complexity)
- Consider SASE platforms for converged security (Zscaler, Palo Alto Prisma)

---

### Challenge 2: Legacy System Integration

**Issue:** Legacy applications may not support modern authentication (SAML, OAuth).

**Mitigation:**
- Use reverse proxies with authentication injection (NGINX, Envoy)
- Deploy privileged access gateways for legacy protocols (RDP, SSH)
- Plan modernization or replacement of unsupportable legacy systems
- Segment legacy systems with strict network policies

---

### Challenge 3: User Experience Impact

**Issue:** Frequent authentication and access checks can frustrate users.

**Mitigation:**
- Implement adaptive authentication (MFA only when risk warrants)
- Use SSO to minimize authentication prompts
- Deploy passwordless authentication (FIDO2, biometrics)
- Transparent ZTNA (no VPN client, seamless access)

---

### Challenge 4: Cultural Resistance

**Issue:** Users and IT staff may resist change from traditional VPN/perimeter model.

**Mitigation:**
- Executive sponsorship and communication of security benefits
- Pilot programs with early adopters
- Training and documentation for IT staff and users
- Demonstrate improved user experience (faster access, no VPN)

---

## Summary

Zero Trust Architecture shifts from perimeter-based security to identity-based continuous verification. Implement in phases over 12 months: Foundation (identity, visibility) → Access Controls (least privilege, PAM) → Micro-Segmentation (ZTNA, service mesh) → Monitoring & Automation (UEBA, SOAR).

Key technologies: IdP (Azure AD, Okta), ZTNA (Zscaler, Palo Alto, Cloudflare), Service Mesh (Istio, Linkerd), UEBA (Microsoft Sentinel, Splunk), PAM (CyberArk, HashiCorp Vault).

Primary benefit: Reduced breach impact through limited lateral movement, continuous verification, and rapid detection.

```

### references/threat-modeling.md

```markdown
# Threat Modeling Reference

## Table of Contents

1. [Overview](#overview)
2. [STRIDE Methodology](#stride-methodology)
3. [PASTA Methodology](#pasta-methodology)
4. [DREAD Risk Scoring](#dread-risk-scoring)
5. [Attack Trees](#attack-trees)
6. [Methodology Selection Guide](#methodology-selection-guide)

## Overview

Threat modeling systematically identifies, analyzes, and prioritizes security threats to design appropriate mitigations proactively.

**When to Threat Model:**
- Designing new applications or systems
- Making significant architecture changes
- Entering new threat environments (cloud migration, IoT deployment)
- Regulatory compliance requirements (PCI DSS, HIPAA)
- After security incidents (lessons learned)

**Threat Modeling Process:**
1. Model the system (data flow diagrams, architecture diagrams)
2. Identify threats (apply methodology - STRIDE, PASTA)
3. Prioritize threats (risk scoring - DREAD, business impact)
4. Design mitigations (security controls)
5. Validate mitigations (testing, review)
6. Document and maintain (living document, update regularly)

---

## STRIDE Methodology

**Developed by:** Microsoft (Loren Kohnfelder and Praerit Garg, 1999)

**Purpose:** Systematic threat identification using 6 threat categories

**Complexity:** Low (accessible to development teams)

### STRIDE Threat Categories

#### S - Spoofing Identity

**Definition:** Attacker pretends to be someone else (user, service, system)

**Examples:**
- Phishing emails impersonating legitimate senders
- Session hijacking (stealing session tokens)
- Credential theft and replay
- Man-in-the-middle attacks
- IP address spoofing

**Mitigations:**
- Multi-factor authentication (MFA)
- Certificate validation (mutual TLS)
- Anti-phishing controls (DMARC, SPF, DKIM)
- Session token security (HttpOnly, Secure flags, short expiry)
- Strong authentication protocols (OAuth 2.0, SAML 2.0)

---

#### T - Tampering with Data

**Definition:** Unauthorized modification of data in storage or transit

**Examples:**
- SQL injection modifying database records
- Man-in-the-middle modifying network traffic
- File system tampering
- Log tampering to hide malicious activity
- Message interception and alteration

**Mitigations:**
- Encryption in transit (TLS 1.3)
- Encryption at rest (AES-256)
- Digital signatures and hashing (SHA-256, HMAC)
- Integrity checks (checksums, cryptographic hashes)
- Input validation and parameterized queries
- Immutable logs (append-only, centralized SIEM)

---

#### R - Repudiation

**Definition:** User denies performing an action, and no proof exists

**Examples:**
- "I didn't make that purchase"
- "I didn't delete that file"
- "I didn't send that email"
- "I didn't access that data"

**Mitigations:**
- Comprehensive audit logging
- Digital signatures (non-repudiation)
- Timestamping (trusted time source)
- Centralized log aggregation (SIEM)
- Tamper-proof logs (immutable storage)
- Video/session recording for critical actions

---

#### I - Information Disclosure

**Definition:** Exposure of confidential information to unauthorized parties

**Examples:**
- SQL injection revealing database contents
- Directory traversal exposing file system
- API responses leaking sensitive data
- Error messages revealing system details
- Unencrypted data transmission
- Misconfigured cloud storage (public S3 buckets)

**Mitigations:**
- Encryption (at rest, in transit, in use)
- Access control (RBAC, ABAC, least privilege)
- Data classification and DLP (Data Loss Prevention)
- Minimize data exposure (return only necessary data)
- Secure error handling (generic error messages)
- Regular security scanning (DAST, penetration testing)

---

#### D - Denial of Service

**Definition:** Making system unavailable to legitimate users

**Examples:**
- DDoS attacks (volumetric, protocol, application layer)
- Resource exhaustion (memory leaks, CPU spikes)
- Application crash exploits (buffer overflow)
- Database locking attacks
- API rate limit bypass

**Mitigations:**
- DDoS protection services (Cloudflare, AWS Shield)
- Rate limiting and throttling (API gateways)
- Resource quotas and limits (CPU, memory, connections)
- Auto-scaling (horizontal scaling under load)
- Circuit breakers and graceful degradation
- Input validation (prevent malformed requests)
- Redundancy and load balancing

---

#### E - Elevation of Privilege

**Definition:** Gaining higher privileges than authorized

**Examples:**
- Privilege escalation exploits (kernel exploits, sudo misconfigurations)
- Buffer overflow attacks
- SQL injection leading to admin access
- Insecure direct object references (IDOR)
- Misconfigured permissions (excessive IAM policies)

**Mitigations:**
- Principle of least privilege
- Input validation and sanitization
- Regular security patching
- Secure coding practices (OWASP guidelines)
- SAST/DAST scanning in CI/CD
- Role-based access control (RBAC)
- Privilege separation (run services as non-root)
- Security testing (penetration testing, fuzzing)

---

### STRIDE Application Process

**Step 1: Model the System**

Create Data Flow Diagrams (DFDs) showing:
- **External Entities:** Users, external systems, APIs
- **Processes:** Application components, services, functions
- **Data Stores:** Databases, file systems, caches
- **Data Flows:** Communication paths between components
- **Trust Boundaries:** Network perimeters, authentication boundaries

**Example DFD Elements:**
```
[User] --(HTTPS)--> [Web Server] --(SQL)--> [Database]
         │                           │
    External Entity            Process      Data Store
         │                           │
    ─────┼───────────────────────────┼────────  Trust Boundary
```

**Step 2: Identify Threats**

Apply STRIDE to each element:
- **External Entities:** Spoofing
- **Processes:** Tampering, Repudiation, Denial of Service, Elevation of Privilege
- **Data Stores:** Tampering, Information Disclosure, Denial of Service
- **Data Flows:** Tampering, Information Disclosure, Denial of Service

**Step 3: Document Threats**

Create threat list with:
- Threat ID
- STRIDE category
- Affected component
- Threat description
- Potential impact
- Proposed mitigation

**Step 4: Prioritize Threats**

Use DREAD scoring or business impact to prioritize.

**Step 5: Mitigate Threats**

Design and implement security controls.

---

### STRIDE Example: Web Application Login

| Component | Threat Type | Threat | Mitigation |
|-----------|-------------|--------|------------|
| Login page | Spoofing | Credential phishing | MFA, anti-phishing (FIDO2), user education |
| Login form | Tampering | Form field manipulation | Server-side validation, CSRF tokens |
| Authentication flow | Repudiation | User denies login | Audit logs with IP, timestamp, device info |
| Database | Info Disclosure | SQL injection exposing passwords | Parameterized queries, password hashing (bcrypt, Argon2) |
| Login endpoint | Denial of Service | Brute force attacks | Rate limiting, account lockout, CAPTCHA |
| Session management | Elevation | Session hijacking → admin access | Secure session tokens, HttpOnly/Secure flags, short expiry |

---

## PASTA Methodology

**Process for Attack Simulation and Threat Analysis**

**Developed by:** VerSprite (Tony UcedaVélez and Marco Morana)

**Purpose:** Risk-centric threat analysis aligned with business objectives

**Complexity:** High (enterprise-level, comprehensive)

### 7 Stages of PASTA

#### Stage 1: Define Business Objectives

Identify business goals, compliance requirements, and acceptable risk levels.

**Activities:**
- Document business objectives (revenue, customer trust, compliance)
- Define security objectives aligned with business (data protection, availability, integrity)
- Identify compliance requirements (GDPR, SOC 2, HIPAA, PCI DSS)
- Determine risk tolerance (risk appetite, risk thresholds)

**Output:** Business context and security objectives

---

#### Stage 2: Define Technical Scope

Inventory assets and document technical architecture.

**Activities:**
- Asset inventory (applications, databases, servers, network devices, cloud resources)
- Network architecture documentation (network diagrams, data flows)
- Identify trust boundaries (internet-facing, internal networks, DMZ)
- Document dependencies (third-party services, APIs, libraries)

**Output:** Technical scope and asset inventory

---

#### Stage 3: Application Decomposition

Break down application into components and analyze each.

**Activities:**
- Identify application components (front-end, API, database, authentication)
- Map data flows between components
- Document authentication and authorization mechanisms
- Identify entry points (user inputs, API endpoints, file uploads)
- Analyze session management

**Output:** Detailed application architecture and data flow diagrams

---

#### Stage 4: Threat Analysis

Identify threat actors and attack vectors.

**Activities:**
- Identify threat actors (cybercriminals, nation-states, insiders, competitors)
- Determine threat actor motivations (financial gain, espionage, disruption)
- Map to MITRE ATT&CK framework (tactics, techniques, procedures)
- Analyze past attack patterns (threat intelligence, incident history)
- Identify attack surfaces (internet-facing assets, supply chain)

**Output:** Threat actor profiles and attack scenarios

---

#### Stage 5: Vulnerability & Weakness Analysis

Identify vulnerabilities in the system.

**Activities:**
- Code review (SAST findings, manual review)
- DAST/penetration testing findings
- Configuration review (misconfigurations, default credentials)
- Map to CWE (Common Weakness Enumeration)
- Dependency vulnerabilities (SCA findings, SBOM analysis)

**Output:** Vulnerability inventory and weaknesses

---

#### Stage 6: Attack Modeling

Simulate attack scenarios and analyze feasibility.

**Activities:**
- Create attack trees for identified threats
- Simulate attack scenarios (walkthrough attack paths)
- Analyze attack feasibility (required skills, resources, time)
- Determine likelihood of success
- Estimate attack impact (data loss, downtime, financial)

**Output:** Attack scenarios and feasibility analysis

---

#### Stage 7: Risk & Impact Analysis

Quantify business impact and prioritize remediation.

**Activities:**
- Quantify financial impact (data breach costs, downtime costs, regulatory fines)
- Assess reputational impact (customer trust, brand damage)
- Calculate risk scores (likelihood × impact)
- Prioritize risks by business impact
- Recommend risk treatments (mitigate, accept, transfer, avoid)

**Output:** Prioritized risk list with business impact and remediation recommendations

---

### PASTA vs STRIDE Comparison

| Aspect | PASTA | STRIDE |
|--------|-------|--------|
| **Focus** | Risk-centric, business-aligned | Threat identification |
| **Complexity** | High (7 stages, comprehensive) | Low (6 categories, straightforward) |
| **Time Required** | Weeks to months | Days to weeks |
| **Output** | Prioritized risks, attack scenarios, business impact | Threat list with mitigations |
| **Best For** | Enterprise risk management, C-level reporting | Development teams, agile environments |
| **Business Alignment** | Strong (starts with business objectives) | Weak (technical focus) |

**When to Use PASTA:**
- Enterprise risk assessments
- Compliance-driven threat modeling
- C-level security reporting
- High-risk systems (financial, healthcare, critical infrastructure)

**When to Use STRIDE:**
- Development team threat modeling
- Agile/DevSecOps integration
- Quick threat identification
- Low to medium risk systems

---

## DREAD Risk Scoring

**Developed by:** Microsoft (now deprecated internally but still widely used)

**Purpose:** Quantify risk with numeric scores for prioritization

### DREAD Factors (1-10 scale)

#### D - Damage Potential

How much damage can the attack cause?

- **10:** Complete system compromise, data destruction, total business disruption
- **7-9:** Significant data loss, major service disruption, regulatory violations
- **4-6:** Information disclosure, partial denial of service, limited data loss
- **1-3:** Minor inconvenience, limited impact

**Example:**
- SQL injection exposing customer database: **9** (massive data breach)
- XSS on low-traffic page: **4** (limited user impact)

---

#### R - Reproducibility

How easily can the attack be reproduced?

- **10:** Attack works every time with no special conditions
- **7-9:** Attack works most of the time with minimal setup
- **4-6:** Attack requires specific timing, conditions, or configuration
- **1-3:** Attack is extremely difficult to reproduce, requires rare conditions

**Example:**
- SQL injection with automated tool: **10** (always works)
- Race condition exploit: **4** (requires precise timing)

---

#### E - Exploitability

How easy is it to launch the attack?

- **10:** No authentication required, automated exploit available, script kiddie level
- **7-9:** Requires authentication, manual exploit, moderate skill
- **4-6:** Requires deep technical knowledge, custom exploit development
- **1-3:** Requires expert-level skills, significant resources, insider access

**Example:**
- Public RCE exploit for known CVE: **10** (Metasploit module available)
- Zero-day kernel exploit: **3** (requires advanced skills)

---

#### A - Affected Users

How many users are affected?

- **10:** All users affected (entire user base)
- **7-9:** Large subset of users (most users, all customers)
- **4-6:** Some users affected (specific user segment)
- **1-3:** Few users affected (admin only, single user)

**Example:**
- Authentication bypass on public website: **10** (all users)
- Privilege escalation requiring admin role: **2** (admin only)

---

#### D - Discoverability

How easy is it to discover the vulnerability?

- **10:** Vulnerability is obvious, already public, scanners detect it
- **7-9:** Vulnerability easily found with standard tools
- **4-6:** Requires some effort, security testing to discover
- **1-3:** Nearly impossible to discover, requires source code access

**Example:**
- Unpatched public CVE: **10** (scanners detect, exploits available)
- Logic flaw in business workflow: **4** (requires code review)

---

### Risk Score Calculation

**Formula:**
```
Risk Score = (Damage + Reproducibility + Exploitability + Affected Users + Discoverability) / 5
```

**Risk Levels:**
- **Critical:** 8.0 - 10.0 (immediate action required)
- **High:** 6.0 - 7.9 (urgent remediation)
- **Medium:** 4.0 - 5.9 (plan remediation)
- **Low:** 1.0 - 3.9 (monitor, low priority)

---

### DREAD Scoring Examples

**Example 1: SQL Injection in Login Form**

- **Damage:** 9 (Database compromise, all user data exposed)
- **Reproducibility:** 10 (Works every time)
- **Exploitability:** 8 (Automated tools available, moderate skill)
- **Affected Users:** 10 (All users' data at risk)
- **Discoverability:** 9 (Common vulnerability, scanners detect)

**Risk Score:** (9 + 10 + 8 + 10 + 9) / 5 = **9.2 (Critical)**

---

**Example 2: Stored XSS on Admin Panel**

- **Damage:** 6 (Admin session hijacking, limited to admin accounts)
- **Reproducibility:** 10 (Works every time)
- **Exploitability:** 7 (Requires authentication, manual exploit)
- **Affected Users:** 2 (Admin users only)
- **Discoverability:** 6 (Requires security testing to find)

**Risk Score:** (6 + 10 + 7 + 2 + 6) / 5 = **6.2 (High)**

---

**Example 3: Information Disclosure in Error Messages**

- **Damage:** 4 (Reveals internal paths, versions; aids reconnaissance)
- **Reproducibility:** 10 (Consistent error messages)
- **Exploitability:** 10 (No authentication required, trivial to trigger)
- **Affected Users:** 10 (All users can trigger)
- **Discoverability:** 8 (Easy to find with basic testing)

**Risk Score:** (4 + 10 + 10 + 10 + 8) / 5 = **8.4 (Critical)**

*Note: Despite low damage, high exploitability and discoverability make this critical for remediation.*

---

## Attack Trees

**Visual threat modeling technique showing hierarchical attack paths**

### Attack Tree Structure

- **Goal (Root Node):** Attacker's objective (e.g., "Compromise Web Application")
- **Attack Paths (Branches):** Different ways to achieve goal
- **Attack Steps (Leaf Nodes):** Atomic actions required

**Gates:**
- **OR Gate:** Any child node success achieves parent goal
- **AND Gate:** All child nodes must succeed for parent goal

---

### Attack Tree Example: Compromise Web Application

```
                    Compromise Web Application (GOAL)
                              │
        ┌─────────────────────┼─────────────────────┐
        │ [OR]                │ [OR]                │ [OR]
   Exploit SQLi        Steal Credentials      Exploit Vuln Lib
        │                     │                     │
    ┌───┴───┐           ┌─────┴─────┐         ┌────┴────┐
    │ [AND] │           │ [OR]      │ [OR]    │ [AND]   │
Find Input Bypass   Phishing  Credential  Find Vuln Exploit
Validation WAF      Email     Stuffing    in SBOM   CVE
    │       │           │           │         │         │
    ▼       ▼           ▼           ▼         ▼         ▼
 [TEST]  [TEST]      [SEND]      [RUN]    [SCAN]    [RUN]
                                 [SCRIPT]            [EXPLOIT]

Leaf Nodes (Actions):
- TEST: Automated scanning (sqlmap, Burp Suite)
- SEND: Phishing campaign
- RUN SCRIPT: Credential stuffing attack
- SCAN: SBOM analysis, vulnerability scanning
- RUN EXPLOIT: Execute public exploit (Metasploit)
```

---

### Attack Tree Analysis

**Assign Values to Leaf Nodes:**

- **Cost:** Time, resources, skills required
- **Likelihood:** Probability of success
- **Detection Risk:** Probability of detection

**Example:**

| Attack Step | Cost | Likelihood | Detection Risk |
|-------------|------|------------|----------------|
| Find Input Validation (SQLi) | Low (automated) | High (common) | Medium (WAF logs) |
| Bypass WAF | Medium (manual) | Medium (depends on WAF) | High (alerts) |
| Phishing Email | Low (templates) | Medium (user training) | High (email filters) |
| Credential Stuffing | Low (automated) | Medium (depends on passwords) | Medium (rate limiting) |
| Find Vuln in SBOM | Low (scanners) | High (if outdated libs) | Low (passive) |
| Exploit CVE | Low (public exploits) | High (if unpatched) | High (IDS/IPS) |

**Risk Calculation:**

Most likely path: **Find Vuln in SBOM → Exploit CVE**
- Low cost, high likelihood, low detection risk (until exploit runs)

**Mitigation Priority:**
1. Dependency scanning and patching (blocks "Find Vuln in SBOM")
2. WAF with virtual patching (blocks "Exploit CVE")
3. Input validation and parameterized queries (blocks "Find Input Validation")

---

## Methodology Selection Guide

### Decision Matrix

| Criterion | STRIDE | PASTA | DREAD | Attack Trees |
|-----------|--------|-------|-------|--------------|
| **Ease of Use** | High | Low | High | Medium |
| **Time Required** | Low (days) | High (weeks) | Low (hours) | Medium (days) |
| **Business Alignment** | Low | High | Medium | Low |
| **Comprehensive Coverage** | High | Very High | N/A (scoring only) | Medium |
| **Quantitative Risk** | No | Yes | Yes | Yes (if values assigned) |
| **Best For** | Dev teams | Enterprise | Prioritization | Visualization |

---

### Recommendation by Use Case

**Use Case: Agile Development Team Threat Modeling**
- **Primary:** STRIDE (quick, comprehensive threat identification)
- **Secondary:** DREAD (prioritize threats for sprint planning)
- **Cadence:** Every major feature, architecture change

**Use Case: Enterprise Risk Assessment**
- **Primary:** PASTA (business-aligned, comprehensive)
- **Secondary:** Attack Trees (visualize complex attack scenarios)
- **Cadence:** Annually, or for critical systems

**Use Case: Prioritizing Vulnerability Remediation**
- **Primary:** DREAD (quantitative risk scoring)
- **Secondary:** CVSS scores (industry standard for CVEs)
- **Cadence:** Continuous (as vulnerabilities discovered)

**Use Case: Security Architecture Review**
- **Primary:** Attack Trees (visualize attack paths)
- **Secondary:** STRIDE (comprehensive threat coverage)
- **Cadence:** During architecture design, major changes

---

## Summary

**Threat Modeling Methodologies:**
- **STRIDE:** Systematic threat identification using 6 categories (Spoofing, Tampering, Repudiation, Info Disclosure, DoS, Elevation of Privilege)
- **PASTA:** 7-stage risk-centric analysis aligned with business objectives
- **DREAD:** Numeric risk scoring (Damage, Reproducibility, Exploitability, Affected Users, Discoverability)
- **Attack Trees:** Visual representation of attack paths and scenarios

**Recommended Approach:**
1. Use STRIDE for comprehensive threat identification
2. Use DREAD to prioritize threats by risk
3. Use Attack Trees to visualize complex attack scenarios
4. Use PASTA for enterprise-level risk assessments with business impact analysis

Integrate threat modeling into SDLC: Design phase (architecture threat modeling), Development (code-level STRIDE), Pre-deployment (comprehensive PASTA for critical systems).

```

### references/nist-csf-mapping.md

```markdown
# NIST Cybersecurity Framework (CSF) 2.0 Reference


## Table of Contents

- [Overview](#overview)
- [Framework Structure](#framework-structure)
- [6 Core Functions](#6-core-functions)
  - [GOVERN (GV) - NEW in CSF 2.0](#govern-gv-new-in-csf-20)
  - [IDENTIFY (ID)](#identify-id)
  - [PROTECT (PR)](#protect-pr)
  - [DETECT (DE)](#detect-de)
  - [RESPOND (RS)](#respond-rs)
  - [RECOVER (RC)](#recover-rc)
- [NIST CSF Implementation Tiers](#nist-csf-implementation-tiers)
- [NIST CSF Profiles](#nist-csf-profiles)
- [Control Mapping Examples](#control-mapping-examples)
  - [Mapping OWASP Top 10 to NIST CSF](#mapping-owasp-top-10-to-nist-csf)
  - [Mapping CIS Controls to NIST CSF](#mapping-cis-controls-to-nist-csf)
  - [Mapping Cloud Security to NIST CSF (AWS Example)](#mapping-cloud-security-to-nist-csf-aws-example)
- [Implementation Roadmap](#implementation-roadmap)
  - [Phase 1: Assess Current State (Weeks 1-4)](#phase-1-assess-current-state-weeks-1-4)
  - [Phase 2: Define Target State (Weeks 5-6)](#phase-2-define-target-state-weeks-5-6)
  - [Phase 3: Gap Analysis (Weeks 7-8)](#phase-3-gap-analysis-weeks-7-8)
  - [Phase 4: Implement Controls (Months 3-12)](#phase-4-implement-controls-months-3-12)
  - [Phase 5: Continuous Improvement (Ongoing)](#phase-5-continuous-improvement-ongoing)
- [NIST CSF vs Other Frameworks](#nist-csf-vs-other-frameworks)
- [Summary](#summary)

## Overview

The NIST Cybersecurity Framework provides a risk-based approach to managing cybersecurity risks. Version 2.0 (released 2024) introduces the GOVERN function and expands scope to all organizations.

**Official Source:** NIST CSF 2.0 (https://www.nist.gov/cyberframework)

## Framework Structure

**Hierarchy:**
- 6 Functions (high-level categories)
- 23 Categories (specific outcomes)
- 106 Subcategories (detailed controls)

## 6 Core Functions

### GOVERN (GV) - NEW in CSF 2.0

**Purpose:** Establish and monitor cybersecurity governance, risk management strategy, and policies.

**Categories:**
- **GV.OC:** Organizational Context
- **GV.RM:** Risk Management Strategy
- **GV.RR:** Roles, Responsibilities, and Authorities
- **GV.PO:** Policy
- **GV.OV:** Oversight
- **GV.SC:** Cybersecurity Supply Chain Risk Management

**Key Controls:**
- Establish cybersecurity governance structure
- Define risk tolerance and risk appetite
- Assign cybersecurity roles and responsibilities
- Develop security policies and procedures
- Supply chain risk management program
- Third-party risk assessments

---

### IDENTIFY (ID)

**Purpose:** Develop organizational understanding to manage cybersecurity risk to systems, people, assets, data, and capabilities.

**Categories:**
- **ID.AM:** Asset Management
- **ID.RA:** Risk Assessment
- **ID.IM:** Improvement

**Key Controls:**

**ID.AM (Asset Management):**
- Hardware asset inventory (servers, workstations, network devices, IoT)
- Software asset inventory (applications, operating systems, firmware)
- Data asset classification (public, internal, confidential, restricted)
- Personnel inventory (employees, contractors, privileged users)
- Network architecture documentation (network diagrams, data flows)

**ID.RA (Risk Assessment):**
- Identify and document cybersecurity risks
- Threat intelligence integration
- Vulnerability assessments (continuous scanning)
- Risk prioritization (likelihood × impact)
- Critical asset identification

**ID.IM (Improvement):**
- Lessons learned from incidents
- Continuous improvement processes
- Security metrics and KPIs

---

### PROTECT (PR)

**Purpose:** Develop and implement appropriate safeguards to ensure delivery of critical services.

**Categories:**
- **PR.AA:** Identity Management, Authentication and Access Control
- **PR.AT:** Awareness and Training
- **PR.DS:** Data Security
- **PR.IP:** Platform Security
- **PR.MA:** Maintenance
- **PR.PS:** Technology Infrastructure Resilience

**Key Controls:**

**PR.AA (Access Control):**
- Identity and credential management
- Multi-factor authentication (MFA)
- Role-based access control (RBAC)
- Privileged access management (PAM)
- Remote access management (ZTNA, VPN)

**PR.DS (Data Security):**
- Encryption at rest (AES-256)
- Encryption in transit (TLS 1.3)
- Data loss prevention (DLP)
- Key management (HSM, KMS)
- Secure data disposal

**PR.IP (Platform Security):**
- Configuration management and hardening
- Secure software development lifecycle (SDLC)
- Security testing (SAST, DAST, SCA)
- Change control processes
- Baseline security configurations (CIS Benchmarks)

**PR.MA (Maintenance):**
- Patch management (automated, risk-based)
- Remote maintenance security
- Asset maintenance logs

**PR.PS (Technology Infrastructure Resilience):**
- Backup and recovery (3-2-1 rule)
- Redundancy and failover
- Capacity planning
- Business continuity planning

---

### DETECT (DE)

**Purpose:** Develop and implement appropriate activities to identify occurrence of cybersecurity events.

**Categories:**
- **DE.AE:** Adverse Event Analysis
- **DE.CM:** Continuous Security Monitoring

**Key Controls:**

**DE.AE (Adverse Event Analysis):**
- Baseline network and system behavior
- Anomaly detection (UEBA, ML-based)
- Security event correlation (SIEM)
- Threat intelligence integration
- Alert prioritization and triage

**DE.CM (Continuous Monitoring):**
- Network monitoring (IDS/IPS, flow logs)
- Endpoint monitoring (EDR)
- Application monitoring (WAF logs, application logs)
- Cloud security monitoring (GuardDuty, Security Command Center)
- Vulnerability scanning (continuous, risk-based)
- Physical access monitoring

---

### RESPOND (RS)

**Purpose:** Develop and implement appropriate activities to take action regarding detected cybersecurity incidents.

**Categories:**
- **RS.MA:** Incident Management
- **RS.AN:** Incident Analysis
- **RS.RP:** Incident Response Reporting and Communication
- **RS.MI:** Incident Mitigation

**Key Controls:**

**RS.MA (Incident Management):**
- Incident response plan (documented, tested)
- Incident response team and roles (CSIRT)
- Incident detection and reporting mechanisms
- Incident categorization and prioritization

**RS.AN (Incident Analysis):**
- Forensic analysis capabilities
- Root cause analysis
- Impact assessment
- Threat intelligence enrichment

**RS.MI (Incident Mitigation):**
- Containment strategies (isolate, quarantine)
- Eradication procedures (remove malware, close vulnerabilities)
- Recovery procedures (restore systems, validate integrity)
- Lessons learned and post-incident review

---

### RECOVER (RC)

**Purpose:** Develop and implement appropriate activities to maintain resilience and restore capabilities or services impaired by cybersecurity incidents.

**Categories:**
- **RC.RP:** Recovery Planning
- **RC.CO:** Recovery Communications

**Key Controls:**

**RC.RP (Recovery Planning):**
- Recovery plan development and maintenance
- Recovery testing (tabletop exercises, full simulations)
- Backup restoration procedures
- Business continuity and disaster recovery (BC/DR)
- Recovery time objectives (RTO) and recovery point objectives (RPO)

**RC.CO (Recovery Communications):**
- Internal communication plans (employees, management)
- External communication plans (customers, regulators, media)
- Stakeholder coordination
- Public relations and reputation management

---

## NIST CSF Implementation Tiers

**Tier 1: Partial**
- Ad-hoc, reactive security
- Limited awareness of cybersecurity risk
- Cybersecurity risk management not formalized

**Tier 2: Risk Informed**
- Risk management practices approved by management but not organization-wide
- Some awareness of cybersecurity risk
- Informal processes

**Tier 3: Repeatable**
- Formal cybersecurity policies and procedures
- Organization-wide risk management program
- Regular risk assessments
- Consistent implementation

**Tier 4: Adaptive**
- Continuous improvement culture
- Advanced threat intelligence integration
- Real-time risk assessment and response
- Predictive indicators and adaptive processes

---

## NIST CSF Profiles

**Current Profile:** Current state of cybersecurity posture (as-is)

**Target Profile:** Desired state of cybersecurity posture (to-be)

**Gap Analysis:** Difference between Current and Target profiles

**Roadmap:** Plan to close gaps and achieve Target profile

---

## Control Mapping Examples

### Mapping OWASP Top 10 to NIST CSF

| OWASP Risk | NIST CSF Function | Category | Example Control |
|------------|-------------------|----------|-----------------|
| **A01: Broken Access Control** | PROTECT | PR.AA | Implement RBAC, least privilege |
| **A02: Cryptographic Failures** | PROTECT | PR.DS | Encryption at rest/transit, key management |
| **A03: Injection** | PROTECT | PR.IP | Input validation, parameterized queries |
| **A04: Insecure Design** | IDENTIFY | ID.RA | Threat modeling, security by design |
| **A05: Security Misconfiguration** | PROTECT | PR.IP | Configuration management, hardening |
| **A06: Vulnerable Components** | IDENTIFY | ID.RA | SCA scanning, dependency management |
| **A07: Authentication Failures** | PROTECT | PR.AA | MFA, secure session management |
| **A08: Software/Data Integrity** | PROTECT | PR.DS | Code signing, integrity checks |
| **A09: Logging/Monitoring Failures** | DETECT | DE.CM | SIEM, centralized logging |
| **A10: Server-Side Request Forgery** | PROTECT | PR.IP | Input validation, network segmentation |

---

### Mapping CIS Controls to NIST CSF

| CIS Control | NIST CSF Function | Category |
|-------------|-------------------|----------|
| **CIS 1: Asset Inventory** | IDENTIFY | ID.AM |
| **CIS 2: Software Inventory** | IDENTIFY | ID.AM |
| **CIS 3: Data Protection** | PROTECT | PR.DS |
| **CIS 4: Secure Configuration** | PROTECT | PR.IP |
| **CIS 5: Account Management** | PROTECT | PR.AA |
| **CIS 6: Access Control** | PROTECT | PR.AA |
| **CIS 7: Vulnerability Management** | IDENTIFY | ID.RA |
| **CIS 8: Audit Log Management** | DETECT | DE.CM |
| **CIS 9: Email/Web Protection** | PROTECT | PR.IP |
| **CIS 10: Malware Defenses** | PROTECT | PR.IP |
| **CIS 11: Data Recovery** | RECOVER | RC.RP |
| **CIS 12: Network Infrastructure** | PROTECT | PR.PS |
| **CIS 13: Network Monitoring** | DETECT | DE.CM |
| **CIS 14: Security Awareness** | PROTECT | PR.AT |
| **CIS 15: Service Provider Mgmt** | GOVERN | GV.SC |
| **CIS 16: Application Security** | PROTECT | PR.IP |
| **CIS 17: Incident Response** | RESPOND | RS.MA |
| **CIS 18: Penetration Testing** | IDENTIFY | ID.RA |

---

### Mapping Cloud Security to NIST CSF (AWS Example)

| AWS Service | NIST CSF Function | Category | Purpose |
|-------------|-------------------|----------|---------|
| **IAM, IAM Identity Center** | PROTECT | PR.AA | Identity and access management |
| **GuardDuty** | DETECT | DE.CM | Threat detection |
| **Security Hub** | DETECT | DE.AE | Centralized security findings |
| **KMS** | PROTECT | PR.DS | Key management |
| **WAF** | PROTECT | PR.IP | Web application firewall |
| **Shield** | PROTECT | PR.PS | DDoS protection |
| **CloudTrail** | DETECT | DE.CM | Audit logging |
| **Config** | PROTECT | PR.IP | Configuration management |
| **Inspector** | IDENTIFY | ID.RA | Vulnerability assessment |
| **Macie** | PROTECT | PR.DS | Data discovery and classification |
| **Systems Manager** | PROTECT | PR.MA | Patch management |
| **Backup** | RECOVER | RC.RP | Backup and recovery |

---

## Implementation Roadmap

### Phase 1: Assess Current State (Weeks 1-4)

1. Conduct cybersecurity risk assessment
2. Document current security controls (Current Profile)
3. Map existing controls to NIST CSF categories
4. Identify gaps and weaknesses

### Phase 2: Define Target State (Weeks 5-6)

1. Define security objectives based on business goals
2. Determine acceptable risk levels (risk appetite)
3. Define Target Profile (desired security posture)
4. Select appropriate Implementation Tier

### Phase 3: Gap Analysis (Weeks 7-8)

1. Compare Current Profile vs. Target Profile
2. Prioritize gaps by risk and business impact
3. Estimate resources and budget for remediation
4. Develop implementation roadmap

### Phase 4: Implement Controls (Months 3-12)

1. Implement high-priority controls first
2. Track progress against roadmap
3. Update Current Profile as controls implemented
4. Regular management reporting

### Phase 5: Continuous Improvement (Ongoing)

1. Monitor security metrics and KPIs
2. Conduct periodic reassessments (annually)
3. Update Target Profile as business changes
4. Lessons learned from incidents

---

## NIST CSF vs Other Frameworks

| Aspect | NIST CSF | CIS Controls | ISO 27001 |
|--------|----------|--------------|-----------|
| **Approach** | Risk-based, flexible | Prescriptive, prioritized | Comprehensive ISMS |
| **Complexity** | Medium | Low (clear priorities) | High (formal certification) |
| **Industry Recognition** | Very high (US focus) | High | Very high (international) |
| **Certification** | No | No | Yes |
| **Cost** | Free | Free | Certification cost |
| **Best For** | Risk management, governance | Baseline security, tactical | Formal ISMS, certification |

---

## Summary

NIST CSF 2.0 provides comprehensive, risk-based framework with 6 functions: GOVERN (new), IDENTIFY, PROTECT, DETECT, RESPOND, RECOVER. Use for security program governance, compliance mapping, and continuous improvement. Implement through phased approach: Assess → Define Target → Gap Analysis → Implement → Improve.

Map NIST CSF to tactical frameworks (CIS Controls for implementation, OWASP for app security, cloud provider frameworks for cloud security). Track progress with Current Profile → Target Profile comparison and Implementation Tiers (1-4).

```

### references/cis-controls.md

```markdown
# CIS Critical Security Controls v8 Reference


## Table of Contents

- [Overview](#overview)
- [Implementation Groups](#implementation-groups)
- [18 CIS Controls](#18-cis-controls)
  - [CIS 1: Inventory and Control of Enterprise Assets](#cis-1-inventory-and-control-of-enterprise-assets)
  - [CIS 2: Inventory and Control of Software Assets](#cis-2-inventory-and-control-of-software-assets)
  - [CIS 3: Data Protection](#cis-3-data-protection)
  - [CIS 4: Secure Configuration of Enterprise Assets and Software](#cis-4-secure-configuration-of-enterprise-assets-and-software)
  - [CIS 5: Account Management](#cis-5-account-management)
  - [CIS 6: Access Control Management](#cis-6-access-control-management)
  - [CIS 7: Continuous Vulnerability Management](#cis-7-continuous-vulnerability-management)
  - [CIS 8: Audit Log Management](#cis-8-audit-log-management)
  - [CIS 13: Network Monitoring and Defense](#cis-13-network-monitoring-and-defense)
  - [CIS 17: Incident Response Management](#cis-17-incident-response-management)

## Overview

CIS Controls provide prioritized, prescriptive security guidance organized in 3 Implementation Groups (IG1, IG2, IG3).

## Implementation Groups

**IG1 (Basic Cyber Hygiene):**
- 56 safeguards
- Small organizations, limited IT security staff
- Essential security baseline

**IG2 (Intermediate):**
- +74 safeguards (130 total)
- Mid-sized organizations with IT security staff
- More sophisticated controls

**IG3 (Advanced):**
- +23 safeguards (153 total)
- Large enterprises with dedicated security teams
- Advanced threat detection and response

## 18 CIS Controls

### CIS 1: Inventory and Control of Enterprise Assets

**Objective:** Maintain accurate asset inventory

**Key Safeguards:**
- 1.1: Establish and maintain detailed asset inventory
- 1.2: Address unauthorized assets
- 1.3: Utilize asset inventory tool
- 1.4: Use dynamic host configuration (DHCP) logging

### CIS 2: Inventory and Control of Software Assets

**Objective:** Track all software and prevent unauthorized software

**Key Safeguards:**
- 2.1: Establish software inventory
- 2.2: Ensure authorized software is supported
- 2.3: Address unauthorized software
- 2.4: Utilize software inventory tools

### CIS 3: Data Protection

**Objective:** Protect sensitive data

**Key Safeguards:**
- 3.1: Establish data management process
- 3.2: Establish data inventory
- 3.3: Configure data access control lists
- 3.6: Encrypt data on end-user devices
- 3.11: Encrypt sensitive data at rest
- 3.14: Log sensitive data access

### CIS 4: Secure Configuration of Enterprise Assets and Software

**Objective:** Harden configurations

**Key Safeguards:**
- 4.1: Establish secure configurations
- 4.2: Establish configuration management
- 4.7: Manage default accounts
- 4.8: Uninstall or disable unnecessary services

### CIS 5: Account Management

**Objective:** Manage user accounts and credentials

**Key Safeguards:**
- 5.1: Establish centralized account management
- 5.2: Use unique passwords
- 5.3: Disable dormant accounts
- 5.4: Restrict admin privileges to dedicated accounts

### CIS 6: Access Control Management

**Objective:** Control access to resources

**Key Safeguards:**
- 6.1: Establish access granting process
- 6.2: Establish access revoking process
- 6.3: Require MFA
- 6.5: Require MFA for remote network access
- 6.8: Define and maintain role-based access control

### CIS 7: Continuous Vulnerability Management

**Objective:** Identify and remediate vulnerabilities

**Key Safeguards:**
- 7.1: Establish vulnerability management process
- 7.2: Remediate vulnerabilities
- 7.3: Perform automated operating system patch management
- 7.4: Perform automated application patch management
- 7.5: Perform automated vulnerability scans

### CIS 8: Audit Log Management

**Objective:** Collect, alert, review, and retain audit logs

**Key Safeguards:**
- 8.1: Establish audit log management process
- 8.2: Collect audit logs
- 8.3: Ensure adequate storage for logs
- 8.9: Centralize audit log collection
- 8.10: Retain audit logs
- 8.11: Conduct audit log reviews

### CIS 13: Network Monitoring and Defense

**Objective:** Monitor and defend network traffic

**Key Safeguards:**
- 13.1: Centralize security event collection
- 13.2: Deploy network-based IDS sensors
- 13.3: Deploy network-based IPS
- 13.6: Collect network traffic flow logs
- 13.10: Perform application layer filtering

### CIS 17: Incident Response Management

**Objective:** Establish incident response capability

**Key Safeguards:**
- 17.1: Designate incident response personnel
- 17.2: Establish incident response process
- 17.3: Maintain incident response contact information
- 17.6: Maintain incident response documentation
- 17.9: Conduct post-incident reviews

```

### references/owasp-top10-mitigation.md

```markdown
# OWASP Top 10 Risk Mitigation Reference


## Table of Contents

- [Overview](#overview)
- [OWASP Top 10 (2021)](#owasp-top-10-2021)
  - [A01: Broken Access Control](#a01-broken-access-control)
  - [A02: Cryptographic Failures](#a02-cryptographic-failures)
  - [A03: Injection](#a03-injection)
  - [A04: Insecure Design](#a04-insecure-design)
  - [A05: Security Misconfiguration](#a05-security-misconfiguration)
  - [A06: Vulnerable and Outdated Components](#a06-vulnerable-and-outdated-components)
  - [A07: Identification and Authentication Failures](#a07-identification-and-authentication-failures)
  - [A08: Software and Data Integrity Failures](#a08-software-and-data-integrity-failures)
  - [A09: Security Logging and Monitoring Failures](#a09-security-logging-and-monitoring-failures)
  - [A10: Server-Side Request Forgery (SSRF)](#a10-server-side-request-forgery-ssrf)

## Overview

The OWASP Top 10 represents the most critical web application security risks. This reference provides detailed mitigation strategies for each risk mapped to security architecture controls.

## OWASP Top 10 (2021)

### A01: Broken Access Control

**Risk Description:** Authorization failures allowing users to access unauthorized data or functionality.

**Common Examples:**
- Insecure Direct Object References (IDOR): Changing URL parameter to access other users' data
- Missing authorization checks on API endpoints
- Privilege escalation (standard user accessing admin functions)
- CORS misconfiguration allowing unauthorized origins

**Architectural Mitigations:**
- Implement RBAC (role-based access control) or ABAC (attribute-based access control)
- Deny access by default (explicit allow lists)
- Authorization checks at every API endpoint
- Use indirect object references (map session → object, not expose IDs)
- Log all authorization failures for monitoring

**Code Example (Node.js/Express):**
```javascript
// Verify user owns resource before returning
app.get('/api/orders/:id', authenticateUser, async (req, res) => {
  const order = await Order.findById(req.params.id);

  // Authorization check: user owns this order
  if (order.userId !== req.user.id) {
    return res.status(403).json({ error: 'Forbidden' });
  }

  res.json(order);
});
```

---

### A02: Cryptographic Failures

**Risk Description:** Failures in cryptography leading to exposure of sensitive data.

**Common Examples:**
- Transmitting data in cleartext (HTTP instead of HTTPS)
- Using weak encryption algorithms (DES, MD5, SHA-1)
- Storing passwords in plaintext or weak hashing
- Hardcoded cryptographic keys

**Architectural Mitigations:**
- Enforce TLS 1.3 for all data in transit
- Encrypt all sensitive data at rest (AES-256)
- Use strong password hashing (Argon2, bcrypt, scrypt)
- Implement key management system (AWS KMS, Azure Key Vault, HashiCorp Vault)
- Rotate encryption keys regularly

---

### A03: Injection

**Risk Description:** Injection flaws (SQL, NoSQL, OS command, LDAP) allowing arbitrary code execution.

**Common Examples:**
- SQL injection via unsanitized user input
- NoSQL injection in MongoDB queries
- OS command injection
- LDAP injection

**Architectural Mitigations:**
- Use parameterized queries or ORM frameworks
- Input validation (whitelist allowed characters, length limits)
- Least privilege database users (no DROP, CREATE permissions)
- WAF with injection detection rules
- Code scanning (SAST/DAST) in CI/CD

**Code Example (SQL Injection Prevention):**
```javascript
// BAD: String concatenation (vulnerable to SQLi)
const query = `SELECT * FROM users WHERE email = '${userInput}'`;

// GOOD: Parameterized query
const query = 'SELECT * FROM users WHERE email = ?';
db.query(query, [userInput], (err, results) => { ... });
```

---

### A04: Insecure Design

**Risk Description:** Missing or ineffective security controls in design phase.

**Mitigations:**
- Conduct threat modeling (STRIDE, PASTA) during design
- Apply secure design patterns
- Separation of concerns (business logic separate from presentation)
- Defense in depth architecture
- Security requirements in every user story

---

### A05: Security Misconfiguration

**Risk Description:** Insecure default configurations, incomplete setups, verbose errors.

**Common Examples:**
- Default admin credentials unchanged
- Directory listing enabled
- Unnecessary services running
- Verbose error messages revealing system details
- Missing security headers

**Architectural Mitigations:**
- Configuration management (CIS Benchmarks, STIG)
- Remove default accounts and sample applications
- Minimal platform (disable unnecessary features)
- Custom error pages (generic messages)
- Security headers (CSP, HSTS, X-Frame-Options)

---

### A06: Vulnerable and Outdated Components

**Risk Description:** Using components with known vulnerabilities.

**Architectural Mitigations:**
- Generate SBOM (Software Bill of Materials)
- Continuous dependency scanning (Snyk, Dependabot, Trivy)
- Automated security updates
- Remove unused dependencies
- Monitor security advisories

---

### A07: Identification and Authentication Failures

**Risk Description:** Weak authentication and session management.

**Architectural Mitigations:**
- Multi-factor authentication (MFA) for all users
- Strong password policies (length > complexity)
- Secure session management (HttpOnly, Secure, SameSite flags)
- Rate limiting on authentication endpoints
- Account lockout after failed attempts

---

### A08: Software and Data Integrity Failures

**Risk Description:** Code and infrastructure not protected from integrity violations.

**Architectural Mitigations:**
- Code signing and verification
- SLSA framework for build integrity
- Subresource Integrity (SRI) for CDN resources
- Integrity checks (checksums, digital signatures)

---

### A09: Security Logging and Monitoring Failures

**Risk Description:** Insufficient logging and monitoring delays breach detection.

**Architectural Mitigations:**
- Centralized logging (SIEM)
- Log all security events (authentication, authorization failures, input validation failures)
- Immutable logs (append-only)
- Real-time alerting
- UEBA for anomaly detection

---

### A10: Server-Side Request Forgery (SSRF)

**Risk Description:** Application fetches remote resource without validating URL.

**Architectural Mitigations:**
- Input validation (whitelist allowed domains)
- Network segmentation (deny outbound traffic from application tier)
- Disable unnecessary URL schemas (file://, gopher://)

```

### references/supply-chain-security.md

```markdown
# Supply Chain Security Reference


## Table of Contents

- [Overview](#overview)
- [SLSA Framework](#slsa-framework)
  - [4 SLSA Levels](#4-slsa-levels)
  - [SLSA Requirements Matrix](#slsa-requirements-matrix)
  - [SLSA Implementation Steps](#slsa-implementation-steps)
- [SBOM (Software Bill of Materials)](#sbom-software-bill-of-materials)
  - [SBOM Standards](#sbom-standards)
  - [CycloneDX SBOM Structure](#cyclonedx-sbom-structure)
  - [Generating SBOMs](#generating-sboms)
  - [SBOM Use Cases](#sbom-use-cases)
- [Dependency Scanning Tools](#dependency-scanning-tools)
- [Dependency Management Best Practices](#dependency-management-best-practices)
  - [1. Minimize Dependencies](#1-minimize-dependencies)
  - [2. Pin Dependency Versions](#2-pin-dependency-versions)
  - [3. Continuous Dependency Scanning](#3-continuous-dependency-scanning)
  - [4. Automated Security Updates](#4-automated-security-updates)
  - [5. Vendor and Maintainer Monitoring](#5-vendor-and-maintainer-monitoring)
  - [6. Private Dependency Mirrors](#6-private-dependency-mirrors)
- [Supply Chain Attack Vectors](#supply-chain-attack-vectors)
  - [1. Compromised Dependencies](#1-compromised-dependencies)
  - [2. Typosquatting](#2-typosquatting)
  - [3. Dependency Confusion](#3-dependency-confusion)
  - [4. Compromised Build Pipeline](#4-compromised-build-pipeline)
- [Summary](#summary)

## Overview

Supply chain attacks target software development and distribution pipelines to inject malicious code or compromise dependencies. High-profile attacks (SolarWinds, Log4Shell, CodeCov) demonstrate critical need for supply chain security.

**Key Frameworks:**
- **SLSA (Supply-chain Levels for Software Artifacts):** Build integrity framework
- **SBOM (Software Bill of Materials):** Dependency transparency
- **SSDF (Secure Software Development Framework):** NIST SP 800-218

## SLSA Framework

**Developed by:** Google (Open Source Security Foundation)

**Purpose:** Protect software artifacts from tampering and ensure build integrity

### 4 SLSA Levels

**SLSA Level 1: Provenance**
- Build process generates provenance metadata
- Documents how artifact was built
- NOT tamper-proof (can be forged)

**Requirements:**
- Provenance generated automatically
- Provenance contains build information

**Example:** GitHub Actions basic workflow with attestation

---

**SLSA Level 2: Hosted Build Platform**
- Build on trusted hosted service
- Provenance generated by platform (more trustworthy than Level 1)

**Requirements:**
- Use hosted build service (GitHub Actions, Cloud Build, GitLab CI)
- Platform generates provenance
- Source and build logs available

**Example:** GitHub Actions with signed attestations

---

**SLSA Level 3: Hardened Build Platform**
- Build platform prevents tampering
- Provenance generation cannot be compromised
- Audit logs of build process

**Requirements:**
- Build service hardened against admin tampering
- Provenance is non-falsifiable
- Isolated build execution
- Audit logs retained

**Example:** Google Cloud Build with Binary Authorization

---

**SLSA Level 4: Hermetic, Reproducible Builds**
- Fully hermetic builds (no network access during build)
- Reproducible builds (same inputs = same output)
- Two-party review for all changes

**Requirements:**
- Hermetic builds (no external dependencies during build)
- Reproducible (deterministic builds)
- Two-person review (pull request approval)
- Immutable build history

**Example:** Debian reproducible builds, Bazel hermetic builds

---

### SLSA Requirements Matrix

| Requirement | L1 | L2 | L3 | L4 |
|-------------|----|----|----|----|
| Provenance exists | ✓ | ✓ | ✓ | ✓ |
| Hosted build platform | | ✓ | ✓ | ✓ |
| Build service hardened | | | ✓ | ✓ |
| Provenance non-falsifiable | | | ✓ | ✓ |
| Isolated build process | | | | ✓ |
| Hermetic builds | | | | ✓ |
| Reproducible builds | | | | ✓ |
| Two-party review | | | | ✓ |

---

### SLSA Implementation Steps

**Step 1: Generate Provenance (SLSA L1)**

Use GitHub Actions to generate attestations:

```yaml
name: Build with Provenance
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
      attestations: write
    steps:
      - uses: actions/checkout@v4
      - name: Build artifact
        run: make build
      - name: Generate attestation
        uses: actions/attest-build-provenance@v1
        with:
          subject-path: 'dist/myapp'
```

**Step 2: Use Hosted Build Service (SLSA L2)**

Migrate to GitHub Actions, GitLab CI, or Cloud Build (already hosted).

**Step 3: Harden Build Platform (SLSA L3)**

- Enable required reviews on protected branches
- Use GitHub Environment protection rules
- Enable audit logging
- Restrict admin access to build pipelines

**Step 4: Implement Hermetic Builds (SLSA L4)**

- Use container-based builds with pinned images
- No network access during build (cache dependencies)
- Reproducible builds with Bazel or Nix

---

## SBOM (Software Bill of Materials)

### SBOM Standards

**1. CycloneDX (OWASP)**
- JSON or XML format
- Comprehensive (components, services, vulnerabilities)
- Strong tooling ecosystem

**2. SPDX (Linux Foundation)**
- ISO/IEC 5962:2021 standard
- Extensive license information
- Wide industry adoption

**3. SWID (Software Identification Tags)**
- ISO/IEC 19770-2 standard
- XML format
- Software asset management focus

---

### CycloneDX SBOM Structure

```json
{
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  "version": 1,
  "metadata": {
    "component": {
      "type": "application",
      "name": "my-app",
      "version": "1.0.0"
    }
  },
  "components": [
    {
      "type": "library",
      "name": "express",
      "version": "4.18.2",
      "purl": "pkg:npm/[email protected]",
      "licenses": [{"license": {"id": "MIT"}}],
      "hashes": [
        {
          "alg": "SHA-256",
          "content": "abc123..."
        }
      ]
    }
  ],
  "dependencies": [
    {
      "ref": "pkg:npm/[email protected]",
      "dependsOn": ["pkg:npm/[email protected]"]
    }
  ],
  "vulnerabilities": [
    {
      "id": "CVE-2024-1234",
      "source": {"name": "NVD"},
      "ratings": [{"severity": "high"}],
      "affects": [{"ref": "pkg:npm/[email protected]"}]
    }
  ]
}
```

---

### Generating SBOMs

**Node.js (NPM):**
```bash
# Using CycloneDX
npx @cyclonedx/cyclonedx-npm --output-file sbom.json

# Using Syft
syft dir:. -o cyclonedx-json > sbom.json
```

**Python:**
```bash
# Using CycloneDX
pip install cyclonedx-bom
cyclonedx-py -o sbom.json

# Using Syft
syft dir:. -o cyclonedx-json > sbom.json
```

**Java (Maven):**
```bash
# Using CycloneDX Maven Plugin
mvn org.cyclonedx:cyclonedx-maven-plugin:makeAggregateBom
```

**Container Images:**
```bash
# Using Syft
syft nginx:latest -o cyclonedx-json > sbom.json

# Using Trivy
trivy image --format cyclonedx nginx:latest > sbom.json
```

---

### SBOM Use Cases

**1. Vulnerability Management**

When CVE disclosed (e.g., Log4Shell):
```bash
# Search SBOM for affected component
cat sbom.json | jq '.components[] | select(.name == "log4j-core")'

# Output: All instances of log4j-core and versions
# Action: Identify affected applications and prioritize patches
```

**2. License Compliance**

```bash
# Extract all licenses from SBOM
cat sbom.json | jq '.components[].licenses[].license.id' | sort | uniq

# Flag non-approved licenses
cat sbom.json | jq '.components[] | select(.licenses[].license.id == "GPL-3.0")'
```

**3. Supply Chain Risk Assessment**

```bash
# Identify all components from specific maintainer
cat sbom.json | jq '.components[] | select(.supplier.name == "Compromised Vendor")'

# Identify components without hash verification
cat sbom.json | jq '.components[] | select(.hashes == null)'
```

**4. Incident Response**

During security incident:
- Quickly identify all applications using vulnerable component
- Assess blast radius across organization
- Prioritize remediation based on SBOM data

---

## Dependency Scanning Tools

| Tool | Languages | Features | Best For |
|------|-----------|----------|----------|
| **Dependabot** | Multi-language | Automated PRs, GitHub native | GitHub users |
| **Snyk** | Multi-language | Vuln scanning, license compliance, fix PRs | Developers, CI/CD |
| **OWASP Dependency-Check** | Java, .NET, Python, Ruby, Node.js | CVE scanning, CLI/CI | Open-source projects |
| **Renovate** | Multi-language | Automated updates, flexible config | Advanced automation |
| **Trivy** | Multi-language, containers, IaC | CVE scanning, misconfiguration | Containers, cloud-native |
| **Grype** | Multi-language, containers | Fast CVE scanning | CI/CD pipelines |
| **JFrog Xray** | Multi-language, artifacts | Artifact scanning, policy enforcement | Enterprise, JFrog users |

---

## Dependency Management Best Practices

### 1. Minimize Dependencies

**Principle:** Fewer dependencies = smaller attack surface

**Actions:**
- Remove unused dependencies
- Evaluate necessity before adding new dependencies
- Consider implementing simple functionality in-house vs. adding dependency
- Use tree-shaking and dead code elimination

**Example:**
```bash
# Analyze dependency tree
npm ls --all

# Find unused dependencies (Node.js)
npx depcheck

# Remove unused
npm uninstall <unused-package>
```

---

### 2. Pin Dependency Versions

**Principle:** Lock files prevent unexpected updates with vulnerabilities

**Actions:**
- Commit lock files (package-lock.json, Pipfile.lock, go.sum)
- Use exact versions in production (avoid `^` or `~` ranges)
- Pin Docker base image versions with SHA digests

**Example:**
```dockerfile
# Bad: Uses mutable tag
FROM node:18

# Better: Use specific version
FROM node:18.19.0

# Best: Pin to specific SHA digest (immutable)
FROM node:18.19.0@sha256:abc123...
```

---

### 3. Continuous Dependency Scanning

**Principle:** Detect vulnerabilities as soon as they're disclosed

**Actions:**
- Scan on every commit (CI/CD integration)
- Scheduled scans (nightly) even without code changes
- Monitor security advisories (GitHub Security Advisories, NVD)

**Example (GitHub Actions):**
```yaml
name: Dependency Scan
on:
  push:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Trivy
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          severity: 'CRITICAL,HIGH'
```

---

### 4. Automated Security Updates

**Principle:** Reduce time-to-patch for security vulnerabilities

**Actions:**
- Enable Dependabot security updates
- Configure auto-merge for patch updates
- Test updates automatically (CI/CD)

**Example (Dependabot config):**
```yaml
version: 2
updates:
  - package-ecosystem: "npm"
    directory: "/"
    schedule:
      interval: "daily"
    open-pull-requests-limit: 10
    reviewers:
      - "security-team"
    # Auto-merge patch updates
    auto-merge:
      enabled: true
      allowed-updates:
        - match:
            dependency-type: "all"
            update-type: "security:patch"
```

---

### 5. Vendor and Maintainer Monitoring

**Principle:** Abandoned or compromised maintainers pose supply chain risks

**Actions:**
- Monitor maintainer activity (last release, commit frequency)
- Check number of maintainers (single maintainer = risk)
- Subscribe to security advisories from vendors
- Consider package reputation and trust scores

**Red Flags:**
- No updates in > 1 year
- Single maintainer with no activity
- Recent ownership transfer
- Unusual dependency additions
- Obfuscated code

---

### 6. Private Dependency Mirrors

**Principle:** Control and audit all dependencies before use

**Actions:**
- Host private package registry (JFrog Artifactory, Nexus, npm Enterprise)
- Proxy public registries through private mirror
- Scan and approve packages before internal use
- Audit all package downloads

**Architecture:**
```
Developer → Private Registry → Public Registry (npm, PyPI)
              (scan, approve)      (upstream source)
```

---

## Supply Chain Attack Vectors

### 1. Compromised Dependencies

**Attack:** Attacker publishes malicious package or compromises existing package

**Examples:**
- event-stream (npm): Malicious code injected to steal cryptocurrency
- ua-parser-js (npm): Compromised package downloaded cryptocurrency miner
- codecov (Bash uploader): Compromised to steal credentials

**Mitigations:**
- Dependency scanning (detect known malicious packages)
- SBOM generation (visibility into all dependencies)
- Subresource Integrity (SRI) for CDN-hosted scripts
- Private package mirrors with approval workflow

---

### 2. Typosquatting

**Attack:** Attacker creates package with similar name to popular package

**Examples:**
- `requessts` (typo of `requests` in Python)
- `electorn` (typo of `electron` in npm)

**Mitigations:**
- Code review of dependency additions
- Use package manager lockfiles (prevent unexpected installs)
- Namespace verification (official org/maintainer)
- IDE/linter warnings for common typos

---

### 3. Dependency Confusion

**Attack:** Attacker publishes public package with same name as internal package, causing package manager to install public (malicious) version

**Examples:**
- Alex Birsan's research (2021): Compromised Apple, Microsoft, Tesla via dependency confusion

**Mitigations:**
- Use scoped packages (@mycompany/package-name)
- Configure package manager to prefer private registry
- Namespace reservation on public registries

**Example (npm config):**
```ini
# .npmrc
@mycompany:registry=https://npm.internal.company.com
registry=https://registry.npmjs.org
```

---

### 4. Compromised Build Pipeline

**Attack:** Attacker gains access to CI/CD pipeline to inject malicious code during build

**Examples:**
- SolarWinds Orion (2020): Build pipeline compromised to inject backdoor
- CodeCov (2021): Compromised Bash uploader script

**Mitigations:**
- SLSA Level 3+ (hardened build platform)
- Least privilege for CI/CD service accounts
- Audit logging for all build pipeline changes
- Two-party review for pipeline modifications
- Hermetic builds (no network access during build)

---

## Summary

**Supply Chain Security Framework:**

1. **SLSA:** Implement progressive build integrity (Level 1 → Level 4)
2. **SBOM:** Generate and maintain dependency transparency (CycloneDX, SPDX)
3. **Dependency Scanning:** Continuous vulnerability detection (Snyk, Trivy, Grype)
4. **Dependency Management:** Pin versions, minimize dependencies, auto-update security patches
5. **Monitoring:** Track maintainer activity, monitor for compromised packages

**Priority Actions:**
- Generate SBOM for all applications (CycloneDX)
- Implement dependency scanning in CI/CD (Trivy, Snyk)
- Achieve SLSA Level 2 (GitHub Actions with attestations)
- Enable automated security updates (Dependabot, Renovate)
- Establish incident response plan for supply chain incidents

```

### references/aws-security-architecture.md

```markdown
# AWS Security Architecture Reference


## Table of Contents

- [AWS Well-Architected Framework - Security Pillar](#aws-well-architected-framework-security-pillar)
  - [5 Design Principles](#5-design-principles)
- [Key AWS Security Services](#key-aws-security-services)
  - [Identity & Access Management](#identity-access-management)
  - [Detection & Response](#detection-response)
  - [Network Security](#network-security)
  - [Data Protection](#data-protection)
  - [Infrastructure Security](#infrastructure-security)
- [Multi-Account Security Architecture](#multi-account-security-architecture)
- [VPC Security Architecture](#vpc-security-architecture)

## AWS Well-Architected Framework - Security Pillar

### 5 Design Principles

1. **Implement strong identity foundation:** Centralize IAM, least privilege, separation of duties
2. **Enable traceability:** Log and monitor all actions
3. **Apply security at all layers:** Defense in depth across network, instance, application, data
4. **Automate security best practices:** Infrastructure as Code
5. **Protect data in transit and at rest:** Encryption everywhere

## Key AWS Security Services

### Identity & Access Management

| Service | Purpose |
|---------|---------|
| **AWS IAM** | User and service identity management |
| **IAM Identity Center** | SSO for multi-account environments |
| **AWS Cognito** | Customer identity and authentication |
| **AWS Organizations** | Multi-account management |

### Detection & Response

| Service | Purpose |
|---------|---------|
| **Amazon GuardDuty** | Threat detection (ML-based) |
| **AWS Security Hub** | Centralized security findings |
| **Amazon Detective** | Security investigation |
| **AWS CloudTrail** | API audit logging |

### Network Security

| Service | Purpose |
|---------|---------|
| **AWS WAF** | Web application firewall |
| **AWS Shield** | DDoS protection |
| **AWS Network Firewall** | Stateful network firewall |
| **AWS PrivateLink** | Private connectivity to services |

### Data Protection

| Service | Purpose |
|---------|---------|
| **AWS KMS** | Key management service |
| **AWS Secrets Manager** | Secrets rotation and management |
| **Amazon Macie** | Data discovery and classification |
| **AWS Certificate Manager** | SSL/TLS certificate management |

### Infrastructure Security

| Service | Purpose |
|---------|---------|
| **AWS Systems Manager** | Patch management, configuration |
| **Amazon Inspector** | Vulnerability scanning |
| **AWS Config** | Configuration compliance monitoring |

## Multi-Account Security Architecture

```
AWS Organizations (Root)
│
├── Security OU
│   ├── Security Account (GuardDuty, Security Hub)
│   ├── Logging Account (CloudTrail, Config, VPC Flow Logs)
│   └── Audit Account (Read-only cross-account access)
│
├── Production OU
│   ├── App1 Production Account
│   ├── App2 Production Account
│   └── Shared Services Account
│
└── Non-Production OU
    ├── Development Account
    ├── Staging Account
    └── Testing Account
```

**Key Patterns:**

- **Service Control Policies (SCPs):** Apply guardrails at OU level
- **IAM Identity Center:** SSO across all accounts
- **GuardDuty/Security Hub:** Organization-wide threat detection
- **Centralized Logging:** All CloudTrail logs to dedicated Logging Account
- **Network Isolation:** Separate VPCs per account, Transit Gateway for connectivity

## VPC Security Architecture

**Best Practices:**
- Public subnets: Internet-facing resources (ALB, NAT Gateway)
- Private subnets: Application tier (no direct internet access)
- Isolated subnets: Database tier (no outbound internet)
- Security groups: Stateful, least privilege rules
- NACLs: Stateless, additional layer of defense

**Example:**
```
VPC (10.0.0.0/16)
├── Public Subnet (10.0.1.0/24)
│   └── Internet Gateway → ALB → Internet
├── Private Subnet (10.0.2.0/24)
│   └── EC2 Instances (application tier)
└── Isolated Subnet (10.0.3.0/24)
    └── RDS (database tier)
```

```

### examples/architectures/aws-multi-account-security.md

```markdown
# AWS Multi-Account Security Architecture

## Overview

Multi-account AWS architecture provides security isolation, billing separation, and blast radius containment. Organize accounts using AWS Organizations with security controls enforced through Service Control Policies (SCPs).

## Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        AWS ORGANIZATION ROOT                             │
│                      (Management Account)                                │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ Organization Policies:                                           │   │
│  │ - Require MFA for all users                                      │   │
│  │ - Enforce encryption at rest                                     │   │
│  │ - Deny root user access                                          │   │
│  │ - Restrict regions (compliance)                                  │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
           │
           ├───────────────────┬──────────────────┬──────────────────┐
           │                   │                  │                  │
           ▼                   ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │ Security OU │    │  Workload OU│    │Infrastructure│    │ Suspended OU│
    │             │    │             │    │     OU      │    │             │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
           │                   │                  │                  │
           │                   │                  │                  │
    ┌──────┴──────┐     ┌──────┴──────┐    ┌─────┴─────┐           │
    ▼             ▼     ▼             ▼    ▼           ▼           ▼
┌────────┐  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Security│  │Logging │ │  Prod  │ │  Dev   │ │Shared  │ │Network │ │Quarantine│
│Tooling │  │Account │ │Account │ │Account │ │Services│ │Account │ │Account │
│        │  │        │ │        │ │        │ │        │ │        │ │        │
│GuardDuty│ │CloudTr.│ │        │ │        │ │CI/CD   │ │Transit │ │        │
│SecHub  │  │Config  │ │        │ │        │ │Artifact│ │Gateway │ │        │
│Macie   │  │S3      │ │        │ │        │ │        │ │VPC     │ │        │
└────────┘  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘
     │           │          │          │          │          │
     │           │          │          │          │          │
     └───────────┴──────────┴──────────┴──────────┴──────────┘
                            │
                            ▼
                  ┌──────────────────┐
                  │ Centralized      │
                  │ Security Logging │
                  │                  │
                  │ - CloudTrail     │
                  │ - VPC Flow Logs  │
                  │ - GuardDuty      │
                  │ - Security Hub   │
                  │ - Config         │
                  └──────────────────┘
```

## Organizational Unit (OU) Design

### Security OU

Contains accounts dedicated to security tooling and centralized logging.

**Security Tooling Account:**
- AWS Security Hub (aggregator)
- Amazon GuardDuty (threat detection)
- Amazon Macie (data discovery)
- AWS IAM Access Analyzer
- AWS Firewall Manager
- Amazon Detective (investigation)

**Logging Account:**
- Centralized CloudTrail logs (organization trail)
- AWS Config aggregator
- VPC Flow Logs aggregation
- S3 bucket policies preventing deletion
- Lifecycle policies for cost optimization
- Cross-region replication for DR

**SCPs Applied:**
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenySecurityLogDeletion",
      "Effect": "Deny",
      "Action": [
        "s3:DeleteBucket",
        "s3:DeleteObject",
        "s3:DeleteObjectVersion",
        "logs:DeleteLogGroup",
        "logs:DeleteLogStream"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": [
            "us-east-1",
            "us-west-2"
          ]
        }
      }
    }
  ]
}
```

### Workload OU

Contains production and non-production application accounts.

**Production Account:**
- Production workloads only
- Strict change control
- Enhanced monitoring
- Automated backups
- Encryption enforced

**Development/Staging Accounts:**
- Lower environment workloads
- Testing and experimentation
- Cost controls via budgets
- Automatic resource cleanup

**SCPs Applied:**
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireEncryption",
      "Effect": "Deny",
      "Action": [
        "s3:PutObject",
        "ec2:RunInstances",
        "rds:CreateDBInstance"
      ],
      "Resource": "*",
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    },
    {
      "Sid": "DenyProdChangesOutsideWindow",
      "Effect": "Deny",
      "Action": [
        "ec2:*",
        "rds:*",
        "lambda:*"
      ],
      "Resource": "*",
      "Condition": {
        "DateGreaterThan": {"aws:CurrentTime": "2024-01-01T17:00:00Z"},
        "DateLessThan": {"aws:CurrentTime": "2024-01-01T09:00:00Z"},
        "StringEquals": {"aws:RequestedRegion": "us-east-1"}
      }
    }
  ]
}
```

### Infrastructure OU

Contains shared infrastructure and networking accounts.

**Shared Services Account:**
- Centralized CI/CD pipelines
- Artifact repositories (ECR, CodeArtifact)
- Shared AMI builder
- Secrets management
- Certificate management (ACM)

**Network Account:**
- AWS Transit Gateway
- VPC peering connections
- AWS Direct Connect
- Route53 private hosted zones
- Network Firewall
- AWS Network Firewall policies

**SCPs Applied:**
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyNetworkChanges",
      "Effect": "Deny",
      "Action": [
        "ec2:DeleteTransitGateway*",
        "ec2:DeleteVpc",
        "ec2:DeleteInternetGateway"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalArn": "arn:aws:iam::*:role/NetworkAdminRole"
        }
      }
    }
  ]
}
```

### Suspended OU

Contains accounts for quarantine and decommissioning.

**Quarantine Account:**
- Compromised resource isolation
- Forensics analysis
- Incident response workspace
- No internet access
- All services disabled except forensics tools

**SCP Applied:**
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyAllExceptForensics",
      "Effect": "Deny",
      "NotAction": [
        "ec2:Describe*",
        "s3:GetObject",
        "s3:ListBucket",
        "cloudtrail:LookupEvents",
        "logs:FilterLogEvents"
      ],
      "Resource": "*"
    }
  ]
}
```

## Service Control Policies (SCPs)

### Global Security Baseline

Apply to all accounts in the organization:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyRootUser",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:root"
        }
      }
    },
    {
      "Sid": "RequireIMDSv2",
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "arn:aws:ec2:*:*:instance/*",
      "Condition": {
        "StringNotEquals": {
          "ec2:MetadataHttpTokens": "required"
        }
      }
    },
    {
      "Sid": "DenyRegionRestriction",
      "Effect": "Deny",
      "NotAction": [
        "iam:*",
        "organizations:*",
        "route53:*",
        "cloudfront:*",
        "support:*"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "us-east-1",
            "us-west-2",
            "eu-west-1"
          ]
        }
      }
    },
    {
      "Sid": "RequireEncryptionAtRest",
      "Effect": "Deny",
      "Action": [
        "s3:PutObject",
        "ec2:CreateVolume",
        "rds:CreateDBInstance"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "AES256",
          "ec2:Encrypted": "true",
          "rds:StorageEncrypted": "true"
        }
      }
    },
    {
      "Sid": "DenySecurityServiceDisable",
      "Effect": "Deny",
      "Action": [
        "guardduty:DeleteDetector",
        "securityhub:DisableSecurityHub",
        "config:DeleteConfigurationRecorder",
        "cloudtrail:StopLogging",
        "macie2:DisableMacie"
      ],
      "Resource": "*"
    }
  ]
}
```

### Cost Control SCP (Development Accounts)

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyExpensiveInstances",
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "arn:aws:ec2:*:*:instance/*",
      "Condition": {
        "ForAnyValue:StringNotLike": {
          "ec2:InstanceType": [
            "t3.*",
            "t3a.*",
            "t4g.*"
          ]
        }
      }
    }
  ]
}
```

## Cross-Account Access Patterns

### Centralized IAM Identity Center (AWS SSO)

**Permission Sets:**

```yaml
# ReadOnlyAccess
PermissionSet:
  Name: ReadOnlyAccess
  ManagedPolicies:
    - arn:aws:iam::aws:policy/ReadOnlyAccess
  SessionDuration: PT4H

# DeveloperAccess
PermissionSet:
  Name: DeveloperAccess
  ManagedPolicies:
    - arn:aws:iam::aws:policy/PowerUserAccess
  InlinePolicy:
    Statement:
      - Effect: Deny
        Action:
          - iam:*
          - organizations:*
        Resource: "*"
  SessionDuration: PT8H

# AdminAccess
PermissionSet:
  Name: AdminAccess
  ManagedPolicies:
    - arn:aws:iam::aws:policy/AdministratorAccess
  SessionDuration: PT1H
  RequireMFA: true
```

### Cross-Account IAM Roles

**Assumption Pattern:**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/TrustedRole"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "unique-external-id-12345"
        },
        "IpAddress": {
          "aws:SourceIp": [
            "10.0.0.0/8",
            "172.16.0.0/12"
          ]
        },
        "Bool": {
          "aws:MultiFactorAuthPresent": "true"
        }
      }
    }
  ]
}
```

**Service-to-Service Pattern:**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "222222222222"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:lambda:us-east-1:222222222222:function/allowed-function"
        }
      }
    }
  ]
}
```

## Centralized Logging Architecture

### CloudTrail Organization Trail

**Configuration:**

```json
{
  "Name": "OrganizationTrail",
  "IsOrganizationTrail": true,
  "IsMultiRegionTrail": true,
  "IncludeGlobalServiceEvents": true,
  "EnableLogFileValidation": true,
  "EventSelectors": [
    {
      "ReadWriteType": "All",
      "IncludeManagementEvents": true,
      "DataResources": [
        {
          "Type": "AWS::S3::Object",
          "Values": ["arn:aws:s3:::*/sensitive-data/*"]
        },
        {
          "Type": "AWS::Lambda::Function",
          "Values": ["arn:aws:lambda:*:*:function/*"]
        }
      ]
    }
  ],
  "InsightSelectors": [
    {
      "InsightType": "ApiCallRateInsight"
    },
    {
      "InsightType": "ApiErrorRateInsight"
    }
  ]
}
```

**S3 Bucket Policy (Logging Account):**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AWSCloudTrailAclCheck",
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudtrail.amazonaws.com"
      },
      "Action": "s3:GetBucketAcl",
      "Resource": "arn:aws:s3:::org-cloudtrail-logs"
    },
    {
      "Sid": "AWSCloudTrailWrite",
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudtrail.amazonaws.com"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::org-cloudtrail-logs/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control"
        }
      }
    },
    {
      "Sid": "DenyUnencryptedObjectUploads",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::org-cloudtrail-logs/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    },
    {
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::org-cloudtrail-logs",
        "arn:aws:s3:::org-cloudtrail-logs/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    }
  ]
}
```

### VPC Flow Logs

**Centralized Collection:**

```bash
# Enable VPC Flow Logs for all VPCs across accounts
aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids vpc-xxxxx \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination arn:aws:s3:::org-vpc-flow-logs \
  --log-format '${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${type} ${pkt-srcaddr} ${pkt-dstaddr} ${region} ${az-id} ${sublocation-type} ${sublocation-id}'
```

**Athena Query Setup:**

```sql
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
  srcaddr string,
  dstaddr string,
  srcport int,
  dstport int,
  protocol int,
  packets bigint,
  bytes bigint,
  start_time bigint,
  end_time bigint,
  action string,
  log_status string,
  vpc_id string,
  subnet_id string,
  instance_id string,
  tcp_flags int,
  type string,
  pkt_srcaddr string,
  pkt_dstaddr string,
  region string,
  az_id string,
  sublocation_type string,
  sublocation_id string
)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://org-vpc-flow-logs/AWSLogs/'
TBLPROPERTIES ("skip.header.line.count"="1");
```

### AWS Security Hub

**Aggregator Configuration:**

```python
import boto3

securityhub = boto3.client('securityhub', region_name='us-east-1')

# Enable Security Hub in aggregator account
securityhub.enable_security_hub(
    EnableDefaultStandards=True
)

# Enable standards
securityhub.batch_enable_standards(
    StandardsSubscriptionRequests=[
        {'StandardsArn': 'arn:aws:securityhub:us-east-1::standards/aws-foundational-security-best-practices/v/1.0.0'},
        {'StandardsArn': 'arn:aws:securityhub:us-east-1::standards/cis-aws-foundations-benchmark/v/1.4.0'},
        {'StandardsArn': 'arn:aws:securityhub:us-east-1::standards/pci-dss/v/3.2.1'}
    ]
)

# Create aggregator
securityhub.create_finding_aggregator(
    RegionLinkingMode='ALL_REGIONS'
)
```

### Amazon GuardDuty

**Organization Configuration:**

```python
import boto3

guardduty = boto3.client('guardduty', region_name='us-east-1')

# Create detector in delegated admin account
detector_response = guardduty.create_detector(
    Enable=True,
    FindingPublishingFrequency='FIFTEEN_MINUTES',
    DataSources={
        'S3Logs': {'Enable': True},
        'Kubernetes': {
            'AuditLogs': {'Enable': True}
        },
        'MalwareProtection': {
            'ScanEc2InstanceWithFindings': {
                'EbsVolumes': {'Enable': True}
            }
        }
    }
)

detector_id = detector_response['DetectorId']

# Enable for organization
guardduty.enable_organization_admin_account(
    AdminAccountId='333333333333'  # Security Tooling account
)

# Auto-enable for new accounts
guardduty.update_organization_configuration(
    DetectorId=detector_id,
    AutoEnable=True,
    DataSources={
        'S3Logs': {'AutoEnable': True},
        'Kubernetes': {
            'AuditLogs': {'AutoEnable': True}
        },
        'MalwareProtection': {
            'ScanEc2InstanceWithFindings': {
                'EbsVolumes': {'AutoEnable': True}
            }
        }
    }
)
```

## Network Security Architecture

### Transit Gateway Design

```
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Prod VPC    │     │  Dev VPC     │     │ Shared VPC   │
│ 10.0.0.0/16  │     │ 10.1.0.0/16  │     │ 10.2.0.0/16  │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       │    ┌───────────────┴────────────────┐   │
       └────┤   AWS Transit Gateway          ├───┘
            │   Route Tables:                │
            │   - Prod (isolated)            │
            │   - Non-Prod (shared)          │
            │   - Egress (internet-bound)    │
            └───────────────┬────────────────┘
                            │
                            ▼
                  ┌──────────────────┐
                  │ Network Firewall │
                  │ - IDS/IPS         │
                  │ - DPI             │
                  │ - Domain filtering│
                  └──────────────────┘
```

### Security Groups Strategy

**Tiered Application Pattern:**

```json
{
  "SecurityGroups": {
    "ALB": {
      "Ingress": [
        {"Protocol": "tcp", "Port": 443, "Source": "0.0.0.0/0"}
      ],
      "Egress": [
        {"Protocol": "tcp", "Port": 8080, "Destination": "sg-app-tier"}
      ]
    },
    "AppTier": {
      "Ingress": [
        {"Protocol": "tcp", "Port": 8080, "Source": "sg-alb"}
      ],
      "Egress": [
        {"Protocol": "tcp", "Port": 5432, "Destination": "sg-db-tier"}
      ]
    },
    "DBTier": {
      "Ingress": [
        {"Protocol": "tcp", "Port": 5432, "Source": "sg-app-tier"}
      ],
      "Egress": []
    }
  }
}
```

## Compliance and Governance

### AWS Config Rules

**Organization Conformance Packs:**

```yaml
ConformancePackName: OrganizationSecurityBaseline
ConformancePackInputParameters:
  - ParameterName: RequiredTags
    ParameterValue: "Environment,Owner,CostCenter"

Resources:
  - ConfigRule:
      ConfigRuleName: encrypted-volumes
      Source:
        Owner: AWS
        SourceIdentifier: ENCRYPTED_VOLUMES
      Scope:
        ComplianceResourceTypes:
          - AWS::EC2::Volume

  - ConfigRule:
      ConfigRuleName: s3-bucket-public-read-prohibited
      Source:
        Owner: AWS
        SourceIdentifier: S3_BUCKET_PUBLIC_READ_PROHIBITED

  - ConfigRule:
      ConfigRuleName: iam-password-policy
      Source:
        Owner: AWS
        SourceIdentifier: IAM_PASSWORD_POLICY
      InputParameters:
        RequireUppercaseCharacters: true
        RequireLowercaseCharacters: true
        RequireSymbols: true
        RequireNumbers: true
        MinimumPasswordLength: 14
        PasswordReusePrevention: 24
        MaxPasswordAge: 90
```

### Backup Strategy

**AWS Backup Organization Policy:**

```json
{
  "plans": {
    "ProductionBackupPlan": {
      "regions": ["us-east-1", "us-west-2"],
      "rules": {
        "DailyBackup": {
          "schedule_expression": "cron(0 5 ? * * *)",
          "start_window_minutes": 60,
          "target_backup_vault_name": "ProductionVault",
          "lifecycle": {
            "move_to_cold_storage_after_days": 30,
            "delete_after_days": 365
          },
          "copy_actions": [
            {
              "destination_backup_vault_arn": "arn:aws:backup:us-west-2:444444444444:backup-vault:ProductionVaultDR",
              "lifecycle": {
                "delete_after_days": 365
              }
            }
          ]
        }
      },
      "selections": {
        "ProductionResources": {
          "iam_role_arn": "arn:aws:iam::444444444444:role/AWSBackupRole",
          "resources": [
            "arn:aws:ec2:*:*:volume/*",
            "arn:aws:rds:*:*:db:*",
            "arn:aws:dynamodb:*:*:table/*"
          ],
          "conditions": {
            "tags": {
              "Environment": "Production"
            }
          }
        }
      }
    }
  }
}
```

## Incident Response Preparation

### Automated Quarantine

**Lambda Function (EventBridge Rule Trigger):**

```python
import boto3
import json

ec2 = boto3.client('ec2')
sns = boto3.client('sns')

def lambda_handler(event, context):
    """
    Quarantine compromised EC2 instance based on GuardDuty finding.
    """
    # Extract instance ID from GuardDuty finding
    finding = event['detail']
    instance_id = finding['resource']['instanceDetails']['instanceId']

    # Create forensics snapshot
    volumes = ec2.describe_instance_attribute(
        InstanceId=instance_id,
        Attribute='blockDeviceMapping'
    )

    for volume in volumes['BlockDeviceMappings']:
        volume_id = volume['Ebs']['VolumeId']
        ec2.create_snapshot(
            VolumeId=volume_id,
            Description=f'Forensics snapshot - GuardDuty finding',
            TagSpecifications=[
                {
                    'ResourceType': 'snapshot',
                    'Tags': [
                        {'Key': 'Forensics', 'Value': 'true'},
                        {'Key': 'SourceInstance', 'Value': instance_id}
                    ]
                }
            ]
        )

    # Apply quarantine security group
    ec2.modify_instance_attribute(
        InstanceId=instance_id,
        Groups=['sg-quarantine']
    )

    # Notify security team
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:333333333333:SecurityAlerts',
        Subject=f'Instance Quarantined: {instance_id}',
        Message=json.dumps(finding, indent=2)
    )

    return {
        'statusCode': 200,
        'body': json.dumps(f'Instance {instance_id} quarantined successfully')
    }
```

## Key Security Metrics

Monitor these metrics across the organization:

1. **Access Metrics:**
   - Failed login attempts per account
   - Root user usage (should be zero)
   - MFA coverage percentage
   - Unused IAM credentials (>90 days)

2. **Compliance Metrics:**
   - Config rule compliance rate
   - Security Hub security score
   - Unencrypted resources count
   - Public-facing resources

3. **Threat Detection:**
   - GuardDuty findings by severity
   - Mean time to remediation (MTTR)
   - Repeat findings count
   - Security Hub critical findings

4. **Network Security:**
   - Unprotected security groups
   - VPC Flow Logs enabled percentage
   - Network Firewall blocks per hour
   - Unusual traffic patterns

## Implementation Checklist

- [ ] Create AWS Organization structure
- [ ] Define OU hierarchy and account placement
- [ ] Configure Service Control Policies (SCPs)
- [ ] Enable AWS IAM Identity Center (SSO)
- [ ] Create permission sets for least privilege
- [ ] Deploy organization CloudTrail
- [ ] Enable GuardDuty organization-wide
- [ ] Configure Security Hub aggregator
- [ ] Deploy AWS Config conformance packs
- [ ] Implement centralized logging (S3 buckets)
- [ ] Configure VPC Flow Logs for all VPCs
- [ ] Deploy Transit Gateway (if applicable)
- [ ] Configure AWS Backup organization policy
- [ ] Create incident response runbooks
- [ ] Deploy automated quarantine mechanisms
- [ ] Establish security metrics dashboard
- [ ] Configure alerting and notifications
- [ ] Document cross-account access patterns
- [ ] Train teams on multi-account operations
- [ ] Conduct tabletop incident response exercise

## References

- [AWS Organizations Best Practices](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_best-practices.html)
- [AWS Security Reference Architecture](https://docs.aws.amazon.com/prescriptive-guidance/latest/security-reference-architecture/welcome.html)
- [CIS AWS Foundations Benchmark](https://www.cisecurity.org/benchmark/amazon_web_services)
- [AWS Multi-Account Strategy](https://aws.amazon.com/organizations/getting-started/best-practices/)

```

### references/gcp-security-architecture.md

```markdown
# GCP Security Architecture Reference

## GCP Security Best Practices

### Key GCP Security Services

#### Identity & Access Management

| Service | Purpose |
|---------|---------|
| **Cloud IAM** | Identity and access management |
| **Identity Platform** | Customer identity (CIAM) |
| **Cloud Identity** | Workforce identity management |

#### Detection & Response

| Service | Purpose |
|---------|---------|
| **Security Command Center** | Unified security and risk dashboard |
| **Chronicle** | SIEM and threat intelligence platform |
| **Event Threat Detection** | Real-time threat detection |

#### Network Security

| Service | Purpose |
|---------|---------|
| **Cloud Armor** | DDoS protection and WAF |
| **VPC Service Controls** | Data exfiltration prevention |
| **Cloud Firewall** | Stateful firewall rules |

#### Data Protection

| Service | Purpose |
|---------|---------|
| **Cloud KMS** | Key management service |
| **Secret Manager** | Secrets management |
| **Cloud DLP** | Data loss prevention |

#### Infrastructure Security

| Service | Purpose |
|---------|---------|
| **Binary Authorization** | Container image signing |
| **Confidential Computing** | Encryption in use (VMs, GKE) |

## GCP Organization Hierarchy

```
Organization (example.com)
│
├── Folder: Production
│   ├── Project: prod-app1
│   ├── Project: prod-app2
│   └── Project: prod-shared-services
│
├── Folder: Non-Production
│   ├── Project: dev-app1
│   ├── Project: staging-app1
│   └── Project: test-app1
│
└── Folder: Security
    ├── Project: security-logging
    └── Project: security-monitoring
```

**Key Patterns:**

- **Organization Policies:** Enforce constraints at org/folder level
- **Shared VPC:** Centralized network management
- **Security Command Center:** Organization-wide security posture
- **VPC Service Controls:** Protect sensitive projects from data exfiltration

```

### examples/architectures/gcp-security-hierarchy.md

```markdown
# GCP Security Hierarchy Architecture

## Overview

Google Cloud Platform (GCP) security architecture uses resource hierarchy for organizational structure and governance. Apply security controls through IAM policies, Organization Policies, and VPC Service Controls to enforce defense-in-depth.

## Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────────┐
│                    ORGANIZATION (example.com)                            │
│                     Organization ID: 123456789                           │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ Organization Policies:                                           │   │
│  │ - Require OS Login on all VMs                                    │   │
│  │ - Disable service account key creation                           │   │
│  │ - Restrict public IP on Cloud SQL                                │   │
│  │ - Require encryption with CMEK                                   │   │
│  │ - Domain restricted sharing (example.com only)                   │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
           │
           ├───────────────────┬──────────────────┬──────────────────┐
           │                   │                  │                  │
           ▼                   ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │Infrastructure│    │  Workloads  │    │   Sandbox   │    │ Deprecated  │
    │   Folder    │    │   Folder    │    │   Folder    │    │   Folder    │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
           │                   │                │                  │
           │                   │                │                  │
    ┌──────┴──────┐     ┌──────┴──────┐        │                  │
    ▼             ▼     ▼             ▼        ▼                  ▼
┌────────┐  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐      ┌────────┐
│Security│  │Network │ │ Prod   │ │  Dev   │ │Sandbox │      │Quarant.│
│Project │  │Project │ │Project │ │Project │ │Project │      │Project │
│        │  │        │ │        │ │        │ │        │      │        │
│SCC     │  │Shared  │ │App     │ │App     │ │Testing │      │Isolated│
│Cloud   │  │VPC     │ │Data    │ │Data    │ │Budget  │      │        │
│Logging │  │Cloud   │ │        │ │        │ │30-day  │      │        │
│        │  │Armor   │ │        │ │        │ │cleanup │      │        │
└────────┘  └────────┘ └────────┘ └────────┘ └────────┘      └────────┘
                │           │
                │           │
                └───────────┴──────────────┐
                                           ▼
                                 ┌──────────────────┐
                                 │ Shared VPC       │
                                 │ (Host Project)   │
                                 │                  │
                                 │ - Firewall Rules │
                                 │ - Cloud NAT      │
                                 │ - Cloud Router   │
                                 │ - Private Google │
                                 │   Access         │
                                 └──────────────────┘
```

## Resource Hierarchy Design

### Organization Level

Root node containing all GCP resources for the domain.

**Organization Policies Applied:**

```yaml
# Require OS Login
constraints/compute.requireOsLogin:
  enforce: true

# Disable service account key creation
constraints/iam.disableServiceAccountKeyCreation:
  enforce: true

# Restrict VM external IPs
constraints/compute.vmExternalIpAccess:
  listPolicy:
    deniedValues:
      - "*"
    allowedValues:
      - "projects/network-project/zones/us-central1-a/instances/bastion"

# Domain restricted sharing
constraints/iam.allowedPolicyMemberDomains:
  listPolicy:
    allowedValues:
      - "C01234567"  # example.com organization ID

# Require CMEK encryption
constraints/gcp.restrictNonCmekServices:
  listPolicy:
    deniedValues:
      - "compute.googleapis.com"
      - "storage.googleapis.com"

# Skip default network creation
constraints/compute.skipDefaultNetworkCreation:
  enforce: true

# Disable automatic IAM grants for default service accounts
constraints/iam.automaticIamGrantsForDefaultServiceAccounts:
  enforce: true
```

**Organization IAM Bindings:**

```yaml
bindings:
  - role: roles/resourcemanager.organizationAdmin
    members:
      - group:[email protected]
    condition:
      title: "Require context-aware access"
      expression: |
        device.is_corp_managed &&
        device.encryption_status == "ENCRYPTED"

  - role: roles/securitycenter.admin
    members:
      - group:[email protected]

  - role: roles/logging.configWriter
    members:
      - serviceAccount:[email protected]

  - role: roles/viewer
    members:
      - group:[email protected]
    condition:
      title: "Read-only during business hours"
      expression: |
        request.time.getHours("America/Los_Angeles") >= 8 &&
        request.time.getHours("America/Los_Angeles") < 18
```

### Infrastructure Folder

Contains shared infrastructure and platform services.

**Security Project:**
- Security Command Center (SCC)
- Cloud Asset Inventory
- Access Transparency logs
- Cloud Audit Logs (organization sink)
- Cloud KMS for CMEK
- Certificate Authority Service
- Security Health Analytics

**Network Project (Shared VPC Host):**
- Shared VPC network
- Cloud Armor policies
- Cloud CDN
- Cloud Load Balancing
- Cloud NAT
- Cloud Interconnect
- Private Service Connect

**Folder Policies:**

```yaml
# Require VPC Service Controls
constraints/compute.restrictVpcPeering:
  listPolicy:
    allowedValues:
      - "under:organizations/123456789/folders/infrastructure"

# Require uniform bucket-level access
constraints/storage.uniformBucketLevelAccess:
  enforce: true
```

### Workloads Folder

Contains production and non-production application projects.

**Production Environment:**
- Service projects (Shared VPC)
- Binary Authorization required
- Enhanced audit logging
- Change approval required
- High availability SLA

**Development/Staging:**
- Service projects (Shared VPC)
- Relaxed deployment policies
- Cost controls via budgets
- Automatic resource cleanup (dev)

**Folder Policies:**

```yaml
# Restrict public IPs on Cloud SQL
constraints/sql.restrictPublicIp:
  enforce: true

# Require Binary Authorization
constraints/binaryauthorization.requireAttestations:
  listPolicy:
    allowedValues:
      - "projects/security-project/attestors/production-attestor"

# Disable default service account usage
constraints/iam.disableServiceAccountKeyUpload:
  enforce: true

# Require labels
constraints/gcp.resourceLocations:
  listPolicy:
    allowedValues:
      - "in:us-locations"
      - "in:eu-locations"
```

### Sandbox Folder

Provides innovation space with relaxed controls.

**Sandbox Projects:**
- Full API access
- Budget limits enforced
- No production data
- No connectivity to corporate network
- Automatic deletion after 90 days

**Folder Policies:**

```yaml
# Budget enforcement
constraints/compute.vmExternalIpAccess:
  listPolicy:
    allowedValues:
      - "*"

# Restrict expensive VM types
constraints/compute.vmMachineTypes:
  listPolicy:
    deniedValues:
      - "n2-*"
      - "c2-*"
      - "m1-*"
    allowedValues:
      - "e2-*"
      - "n1-standard-1"
      - "n1-standard-2"
```

### Deprecated Folder

Contains projects being decommissioned or quarantined.

**Quarantine Project:**
- All APIs disabled except logging
- No external connectivity
- Read-only access for security team
- Forensics tooling enabled

**Folder Policies:**

```yaml
# Deny all API usage except logging
constraints/serviceuser.services:
  listPolicy:
    deniedValues:
      - "*"
    allowedValues:
      - "logging.googleapis.com"
      - "cloudresourcemanager.googleapis.com"
```

## VPC Service Controls

### Security Perimeter Design

```
┌─────────────────────────────────────────────────────────────┐
│              VPC SERVICE CONTROL PERIMETER                   │
│                   (Production Perimeter)                     │
│                                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │ Prod Project │  │ Data Project │  │ ML Project   │      │
│  │              │  │              │  │              │      │
│  │ - Compute    │  │ - BigQuery   │  │ - Vertex AI  │      │
│  │ - GKE        │  │ - Cloud SQL  │  │ - AI Platform│      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│                                                               │
│  Restricted Services:                                        │
│  - storage.googleapis.com                                    │
│  - bigquery.googleapis.com                                   │
│  - compute.googleapis.com                                    │
│                                                               │
│  Ingress Rules:                                              │
│  - Allow from corporate IP ranges                            │
│  - Allow from Cloud Identity (via Access Levels)            │
│                                                               │
│  Egress Rules:                                               │
│  - Allow to Google APIs only                                 │
│  - Deny to internet                                          │
│                                                               │
└─────────────────────────────────────────────────────────────┘
```

**Perimeter Configuration:**

```yaml
name: "accessPolicies/123456/servicePerimeters/production_perimeter"
title: "Production Perimeter"
description: "VPC SC perimeter for production workloads"
perimeterType: "PERIMETER_TYPE_REGULAR"

status:
  resources:
    - "projects/123456789"  # prod-project
    - "projects/234567890"  # data-project
    - "projects/345678901"  # ml-project

  restrictedServices:
    - "storage.googleapis.com"
    - "bigquery.googleapis.com"
    - "compute.googleapis.com"
    - "container.googleapis.com"
    - "sqladmin.googleapis.com"

  accessLevels:
    - "accessPolicies/123456/accessLevels/corp_access"
    - "accessPolicies/123456/accessLevels/secure_device_access"

  vpcAccessibleServices:
    enableRestriction: true
    allowedServices:
      - "RESTRICTED-SERVICES"
      - "logging.googleapis.com"
      - "monitoring.googleapis.com"

  ingressPolicies:
    - ingressFrom:
        sources:
          - accessLevel: "accessPolicies/123456/accessLevels/corp_access"
        identities:
          - "serviceAccount:[email protected]"
      ingressTo:
        operations:
          - serviceName: "storage.googleapis.com"
            methodSelectors:
              - method: "google.storage.objects.create"
              - method: "google.storage.objects.get"
        resources:
          - "*"

  egressPolicies:
    - egressFrom:
        identities:
          - "serviceAccount:[email protected]"
      egressTo:
        operations:
          - serviceName: "bigquery.googleapis.com"
        resources:
          - "projects/234567890"
```

**Access Levels:**

```yaml
# Corporate network access
name: "accessPolicies/123456/accessLevels/corp_access"
title: "Corporate Access"
basic:
  conditions:
    - ipSubnetworks:
        - "203.0.113.0/24"  # Corporate office
        - "198.51.100.0/24"  # VPN range
      devicePolicy:
        requireCorpOwned: true
        requireScreenlock: true
      regions:
        - "US"
        - "EU"

# Secure device access (BeyondCorp)
name: "accessPolicies/123456/accessLevels/secure_device_access"
title: "Secure Device Access"
basic:
  conditions:
    - devicePolicy:
        requireCorpOwned: true
        requireAdminApproval: true
        requireScreenlock: true
        osConstraints:
          - osType: "DESKTOP_CHROME_OS"
            minimumVersion: "100.0.0"
          - osType: "DESKTOP_WINDOWS"
            minimumVersion: "10.0.19041"
      regions:
        - "US"
      members:
        - "user:[email protected]"
        - "group:[email protected]"
```

## IAM Policy Inheritance

### Hierarchy Example

```
Organization
├── IAM: roles/viewer → group:[email protected]
│
└── Folder: Infrastructure
    ├── IAM: roles/compute.networkAdmin → group:[email protected]
    │
    └── Project: network-project
        ├── IAM (inherited): roles/viewer → group:[email protected]
        ├── IAM (inherited): roles/compute.networkAdmin → group:[email protected]
        └── IAM: roles/compute.instanceAdmin → serviceAccount:[email protected]
```

**Policy Inheritance Rules:**

1. Policies are inherited down the hierarchy
2. Child resources cannot remove inherited permissions
3. Child resources can add additional permissions
4. Most permissive policy wins
5. Deny policies (IAM Deny) override allow policies

### IAM Deny Policies

**Prevent Principal Deletion:**

```yaml
name: "policies/prevent-principal-deletion"
displayName: "Prevent deletion of critical service accounts"
rules:
  - denyRule:
      deniedPrincipals:
        - principalSet: "//iam.googleapis.com/projects/123456789/serviceAccounts/*"
      deniedPermissions:
        - "iam.serviceAccounts.delete"
      exceptionPrincipals:
        - "principal://goog/subject/[email protected]"
```

**Prevent Data Exfiltration:**

```yaml
name: "policies/prevent-data-exfiltration"
displayName: "Prevent external bucket access"
rules:
  - denyRule:
      deniedPrincipals:
        - "principalSet://goog/public:all"
      deniedPermissions:
        - "storage.objects.get"
        - "storage.objects.list"
      denialCondition:
        expression: |
          resource.name.startsWith("projects/_/buckets/sensitive-") &&
          !principal.in(["domain:example.com"])
```

### Service Account Best Practices

**Workload Identity for GKE:**

```yaml
# Kubernetes Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-ksa
  namespace: production
  annotations:
    iam.gke.io/gcp-service-account: [email protected]
```

```bash
# Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
  [email protected] \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:prod-project.svc.id.goog[production/app-ksa]"
```

**Short-Lived Credentials:**

```python
from google.auth import impersonated_credentials
from google.auth.transport import requests

# Source credentials (user or SA)
source_credentials, project = google.auth.default()

# Impersonate target service account
target_scopes = ['https://www.googleapis.com/auth/cloud-platform']
target_credentials = impersonated_credentials.Credentials(
    source_credentials=source_credentials,
    target_principal='[email protected]',
    target_scopes=target_scopes,
    lifetime=3600  # 1 hour max
)

# Use credentials
auth_request = requests.Request()
target_credentials.refresh(auth_request)
```

## Security Command Center

### Enable SCC Premium

```bash
# Enable Security Command Center API
gcloud services enable securitycenter.googleapis.com

# Configure SCC
gcloud scc settings update \
  --organization=123456789 \
  --enable-asset-discovery \
  --enable-security-health-analytics \
  --enable-event-threat-detection \
  --enable-container-threat-detection \
  --enable-web-security-scanner
```

### Security Health Analytics

**Custom Detectors:**

```yaml
# Detect public GCS buckets
name: "organizations/123456789/securityHealthAnalyticsSettings/customModules/public-bucket-detector"
displayName: "Public GCS Bucket Detector"
enablementState: ENABLED
customConfig:
  predicate:
    expression: |
      resource.type == "storage.googleapis.com/Bucket" &&
      resource.data.iamConfiguration.publicAccessPrevention == "UNSPECIFIED"
  resourceSelector:
    resourceTypes:
      - "storage.googleapis.com/Bucket"
  severity: HIGH
  description: "Detects Cloud Storage buckets without public access prevention"
  recommendation: "Enable Public Access Prevention on the bucket"
```

### Event Threat Detection

**Custom Threat Rules:**

```yaml
# Detect privilege escalation
name: "organizations/123456789/eventThreatDetectionSettings/customModules/privilege-escalation"
displayName: "Privilege Escalation Detection"
enablementState: ENABLED
customConfig:
  predicate:
    expression: |
      event.type == "google.iam.admin.v1.SetIamPolicy" &&
      event.data.policyDelta.bindingDeltas.exists(
        delta, delta.role.startsWith("roles/owner") ||
               delta.role.startsWith("roles/editor")
      )
  severity: CRITICAL
  description: "Detects IAM policy changes granting Owner or Editor roles"
```

### Continuous Exports to SIEM

```bash
# Create BigQuery export
gcloud scc bq-exports create prod-scc-export \
  --organization=123456789 \
  --dataset=projects/security-project/datasets/scc_findings \
  --description="Export SCC findings to BigQuery"

# Create Pub/Sub export
gcloud scc notifications create security-alerts \
  --organization=123456789 \
  --pubsub-topic=projects/security-project/topics/scc-findings \
  --description="Real-time SCC findings" \
  --filter="severity=\"HIGH\" OR severity=\"CRITICAL\""
```

## Binary Authorization

### Attestation Policy

```yaml
name: "projects/prod-project/policy"
globalPolicyEvaluationMode: ENABLE
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
  requireAttestationsBy:
    - "projects/security-project/attestors/vulnerability-scanner"
    - "projects/security-project/attestors/code-review"
kubernetesNamespaceAdmissionRules:
  production:
    evaluationMode: REQUIRE_ATTESTATION
    enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
    requireAttestationsBy:
      - "projects/security-project/attestors/vulnerability-scanner"
      - "projects/security-project/attestors/code-review"
      - "projects/security-project/attestors/qa-approval"
  development:
    evaluationMode: ALWAYS_ALLOW
    enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
```

### Attestor Configuration

```yaml
# Vulnerability scanner attestor
name: "projects/security-project/attestors/vulnerability-scanner"
description: "Container vulnerability scan approval"
userOwnedGrafeasNote:
  noteReference: "projects/security-project/notes/vulnerability-scan-note"
  publicKeys:
    - pkixPublicKey:
        publicKeyPem: |
          -----BEGIN PUBLIC KEY-----
          MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
          -----END PUBLIC KEY-----
        signatureAlgorithm: RSA_SIGN_PKCS1_4096_SHA512
```

### CI/CD Attestation

```python
import base64
from google.cloud import containeranalysis_v1
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import padding

def create_attestation(project_id, note_id, image_url, private_key_path):
    """
    Create Binary Authorization attestation after vulnerability scan.
    """
    client = containeranalysis_v1.ContainerAnalysisClient()
    grafeas_client = client.get_grafeas_client()

    note_name = f"projects/{project_id}/notes/{note_id}"
    artifact_url = f"https://{image_url}"

    # Create occurrence (attestation)
    occurrence = {
        "resource_uri": artifact_url,
        "note_name": note_name,
        "attestation": {
            "attestation": {
                "serialized_payload": base64.b64encode(
                    artifact_url.encode()
                ).decode(),
            }
        }
    }

    # Sign attestation
    with open(private_key_path, 'rb') as key_file:
        private_key = serialization.load_pem_private_key(
            key_file.read(),
            password=None
        )

    signature = private_key.sign(
        artifact_url.encode(),
        padding.PKCS1v15(),
        hashes.SHA512()
    )

    occurrence["attestation"]["attestation"]["signatures"] = [
        {
            "public_key_id": "vulnerability-scanner-key-1",
            "signature": base64.b64encode(signature).decode()
        }
    ]

    # Create occurrence
    created = grafeas_client.create_occurrence(
        parent=f"projects/{project_id}",
        occurrence=occurrence
    )

    print(f"Attestation created: {created.name}")
    return created
```

## Cloud Audit Logs

### Organization-Level Log Sink

```bash
# Create organization log sink to BigQuery
gcloud logging sinks create org-audit-logs-sink \
  bigquery.googleapis.com/projects/security-project/datasets/audit_logs \
  --organization=123456789 \
  --include-children \
  --log-filter='logName:"cloudaudit.googleapis.com"'

# Grant sink service account permissions
gcloud projects add-iam-policy-binding security-project \
  --member="serviceAccount:[email protected]" \
  --role="roles/bigquery.dataEditor"
```

### Data Access Audit Configuration

```yaml
auditConfigs:
  - service: "allServices"
    auditLogConfigs:
      - logType: "ADMIN_READ"
      - logType: "DATA_READ"
      - logType: "DATA_WRITE"

  - service: "storage.googleapis.com"
    auditLogConfigs:
      - logType: "ADMIN_READ"
      - logType: "DATA_READ"
      - logType: "DATA_WRITE"
        exemptedMembers:
          - "serviceAccount:[email protected]"

  - service: "bigquery.googleapis.com"
    auditLogConfigs:
      - logType: "ADMIN_READ"
      - logType: "DATA_READ"
        logType: "DATA_WRITE"
```

### Audit Log Analysis

```sql
-- Detect privilege escalation
SELECT
  timestamp,
  protoPayload.authenticationInfo.principalEmail,
  protoPayload.methodName,
  protoPayload.resourceName,
  JSON_EXTRACT(protoPayload.request, '$.policy.bindings') AS new_bindings
FROM
  `security-project.audit_logs.cloudaudit_googleapis_com_activity_*`
WHERE
  protoPayload.methodName = 'google.iam.admin.v1.SetIamPolicy'
  AND JSON_EXTRACT(protoPayload.request, '$.policy.bindings') LIKE '%roles/owner%'
  AND timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
ORDER BY timestamp DESC;

-- Detect data exfiltration attempts
SELECT
  timestamp,
  protoPayload.authenticationInfo.principalEmail,
  protoPayload.resourceName,
  COUNT(*) as access_count
FROM
  `security-project.audit_logs.cloudaudit_googleapis_com_data_access_*`
WHERE
  protoPayload.serviceName = 'storage.googleapis.com'
  AND protoPayload.methodName = 'storage.objects.get'
  AND protoPayload.resourceName LIKE '%sensitive-%'
  AND timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
GROUP BY 1, 2, 3
HAVING access_count > 100
ORDER BY access_count DESC;
```

## Network Security

### Firewall Rules Hierarchy

```yaml
# Deny all ingress (base rule)
name: "deny-all-ingress"
priority: 65535
direction: INGRESS
action: DENY
targetTags: []
sourceRanges:
  - "0.0.0.0/0"

# Allow SSH from IAP
name: "allow-ssh-iap"
priority: 1000
direction: INGRESS
action: ALLOW
sourceRanges:
  - "35.235.240.0/20"  # IAP range
allowed:
  - IPProtocol: tcp
    ports:
      - "22"
targetTags:
  - "allow-ssh"

# Allow internal traffic
name: "allow-internal"
priority: 1100
direction: INGRESS
action: ALLOW
sourceRanges:
  - "10.128.0.0/9"  # Internal VPC range
allowed:
  - IPProtocol: tcp
    ports:
      - "0-65535"
  - IPProtocol: udp
    ports:
      - "0-65535"
  - IPProtocol: icmp
```

### Cloud Armor Security Policies

```yaml
# DDoS and WAF protection
name: "cloud-armor-policy"
description: "WAF and DDoS protection for public endpoints"
rules:
  - priority: 0
    description: "Default rule"
    action: "allow"
    match:
      versionedExpr: "SRC_IPS_V1"
      config:
        srcIpRanges:
          - "*"

  - priority: 10
    description: "Block known bad IPs"
    action: "deny(403)"
    match:
      versionedExpr: "SRC_IPS_V1"
      config:
        srcIpRanges:
          - "192.0.2.0/24"

  - priority: 20
    description: "Rate limit per IP"
    action: "rate_based_ban"
    rateLimitOptions:
      conformAction: "allow"
      exceedAction: "deny(429)"
      enforceOnKey: "IP"
      rateLimitThreshold:
        count: 100
        intervalSec: 60
      banDurationSec: 600

  - priority: 30
    description: "Block SQL injection"
    action: "deny(403)"
    match:
      expr:
        expression: |
          evaluatePreconfiguredExpr('sqli-stable',
            ['owasp-crs-v030001-id942251-sqli',
             'owasp-crs-v030001-id942420-sqli',
             'owasp-crs-v030001-id942431-sqli'])

  - priority: 40
    description: "Block XSS"
    action: "deny(403)"
    match:
      expr:
        expression: |
          evaluatePreconfiguredExpr('xss-stable',
            ['owasp-crs-v030001-id941150-xss',
             'owasp-crs-v030001-id941320-xss'])
```

### Private Google Access

```yaml
# Enable Private Google Access for subnet
name: "private-subnet"
network: "projects/network-project/global/networks/shared-vpc"
region: "us-central1"
ipCidrRange: "10.128.0.0/20"
privateIpGoogleAccess: true

# Configure Private Service Connect
pscConnection:
  network: "projects/network-project/global/networks/shared-vpc"
  serviceAttachments:
    - "projects/SERVICE_PROJECT/regions/us-central1/serviceAttachments/all-apis"
  ipAddress: "10.128.10.10"
```

## Cloud KMS and Encryption

### Key Hierarchy

```
Key Ring: production-keyring (us-central1)
├── Database Encryption Key
│   ├── Purpose: ENCRYPT_DECRYPT
│   ├── Rotation: 90 days
│   └── Versions: 3 active
│
├── Application Secrets Key
│   ├── Purpose: ENCRYPT_DECRYPT
│   ├── Rotation: 30 days
│   └── Versions: 5 active
│
└── Signing Key
    ├── Purpose: ASYMMETRIC_SIGN
    ├── Algorithm: RSA_SIGN_PKCS1_4096_SHA256
    └── Versions: 1 active
```

**Key Configuration:**

```yaml
# Create key ring
name: "projects/security-project/locations/us-central1/keyRings/production-keyring"

# Create crypto key
name: "projects/security-project/locations/us-central1/keyRings/production-keyring/cryptoKeys/database-key"
purpose: "ENCRYPT_DECRYPT"
versionTemplate:
  algorithm: "GOOGLE_SYMMETRIC_ENCRYPTION"
  protectionLevel: "HSM"
rotationPeriod: "7776000s"  # 90 days
nextRotationTime: "2024-04-01T00:00:00Z"
```

**CMEK for Cloud SQL:**

```bash
# Create Cloud SQL instance with CMEK
gcloud sql instances create prod-db \
  --tier=db-n1-standard-4 \
  --region=us-central1 \
  --disk-encryption-key=projects/security-project/locations/us-central1/keyRings/production-keyring/cryptoKeys/database-key \
  --disk-encryption-key-keyring=production-keyring \
  --disk-encryption-key-location=us-central1
```

## Key Security Metrics

Monitor these metrics across GCP:

1. **IAM Metrics:**
   - Service account key age
   - Overly permissive roles (Owner, Editor)
   - Unused service accounts
   - External members in IAM policies

2. **Compliance Metrics:**
   - Organization policy violations
   - SCC findings by severity
   - Asset inventory changes
   - Non-compliant resources

3. **Network Metrics:**
   - Firewall rule hits
   - Cloud Armor blocks
   - VPC SC perimeter violations
   - Public IP usage

4. **Threat Detection:**
   - Event Threat Detection findings
   - Anomalous API usage
   - Failed authentication attempts
   - Data exfiltration attempts

## Implementation Checklist

- [ ] Create organization and verify domain ownership
- [ ] Design folder hierarchy
- [ ] Apply organization policies
- [ ] Create infrastructure projects (security, network)
- [ ] Deploy Shared VPC network
- [ ] Configure firewall rules
- [ ] Enable Security Command Center Premium
- [ ] Configure VPC Service Controls perimeters
- [ ] Define access levels
- [ ] Create workload projects
- [ ] Attach service projects to Shared VPC
- [ ] Configure Cloud KMS and create keys
- [ ] Enable CMEK for data services
- [ ] Configure organization audit log sinks
- [ ] Deploy Binary Authorization
- [ ] Create attestors and policies
- [ ] Configure Cloud Armor policies
- [ ] Enable Private Google Access
- [ ] Establish security monitoring dashboard
- [ ] Configure alerting (Pub/Sub → Cloud Functions)
- [ ] Document architecture and procedures

## References

- [GCP Resource Hierarchy](https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy)
- [Organization Policy Constraints](https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints)
- [VPC Service Controls](https://cloud.google.com/vpc-service-controls/docs)
- [Security Command Center](https://cloud.google.com/security-command-center/docs)
- [Binary Authorization](https://cloud.google.com/binary-authorization/docs)
- [CIS Google Cloud Platform Foundation Benchmark](https://www.cisecurity.org/benchmark/google_cloud_computing_platform)

```

### references/azure-security-architecture.md

```markdown
# Azure Security Architecture Reference

## Microsoft Defender for Cloud

### Key Azure Security Services

#### Identity & Access Management

| Service | Purpose |
|---------|---------|
| **Azure AD (Entra ID)** | Identity platform |
| **Privileged Identity Management** | JIT access for admins |
| **Conditional Access** | Risk-based access policies |

#### Detection & Response

| Service | Purpose |
|---------|---------|
| **Microsoft Defender for Cloud** | CSPM and CWPP |
| **Microsoft Sentinel** | SIEM and SOAR platform |
| **Azure Monitor** | Logging and metrics collection |

#### Network Security

| Service | Purpose |
|---------|---------|
| **Azure Firewall** | Stateful network firewall |
| **Azure Front Door + WAF** | Global CDN and web application firewall |
| **Azure DDoS Protection** | DDoS mitigation |

#### Data Protection

| Service | Purpose |
|---------|---------|
| **Azure Key Vault** | Secrets, keys, certificates management |
| **Azure Information Protection** | Data classification and DLP |
| **Storage Encryption** | At-rest encryption for storage |

#### Infrastructure Security

| Service | Purpose |
|---------|---------|
| **Just-in-Time VM Access** | Time-bound SSH/RDP access |
| **Azure Policy** | Compliance enforcement |
| **Azure Blueprints** | Repeatable compliant environments |

## Azure Landing Zone Architecture (Hub-Spoke)

```
Azure AD Tenant (Root)
│
Management Groups Hierarchy:
  Root → Platform → Landing Zones → Applications
│
├── Hub VNet (Shared Services)
│   ├── Azure Firewall
│   ├── VPN Gateway
│   ├── Azure Bastion
│   └── Shared Services (DNS, monitoring)
│
├── Spoke VNet 1 (Production Workloads)
│   └── Application VMs/Services
│
├── Spoke VNet 2 (Development Workloads)
│   └── Application VMs/Services
│
└── Shared Services Subscription
    ├── Microsoft Defender for Cloud
    ├── Azure Monitor / Log Analytics
    └── Microsoft Sentinel
```

**Key Patterns:**

- **Hub-Spoke Topology:** Centralized security and networking
- **Management Groups:** Policy hierarchy and governance
- **Azure Policy:** Enforce compliance (e.g., require encryption)
- **Landing Zones:** Pre-configured secure environments

```

### examples/architectures/azure-landing-zone.md

```markdown
# Azure Landing Zone Security Architecture

## Overview

Azure Landing Zone provides a secure, scalable foundation for enterprise Azure deployments. Organize resources using management groups with governance enforced through Azure Policy and role-based access control (RBAC).

## Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────────┐
│                    TENANT ROOT GROUP                                     │
│                 (Contoso Organization)                                   │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ Tenant-Level Policies:                                           │   │
│  │ - Require tags (Environment, Owner, CostCenter)                  │   │
│  │ - Allowed locations (East US 2, West US 2, West Europe)         │   │
│  │ - Require encryption in transit and at rest                      │   │
│  │ - Diagnostic settings for all resources                          │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
           │
           ├────────────────┬──────────────────┬──────────────────┐
           │                │                  │                  │
           ▼                ▼                  ▼                  ▼
    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
    │  Platform   │  │  Landing    │  │ Decommission│  │   Sandbox   │
    │             │  │   Zones     │  │             │  │             │
    └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘
           │                │                │                │
           │                │                │                │
    ┌──────┴───┬───────┐   │         ┌──────┴──────┐        │
    ▼          ▼       ▼   │         ▼             ▼        ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐   ┌────────┐ ┌────────┐
│Identity│ │Manage. │ │Connect.│ │ Corp   │   │Quarant.│ │Sandbox │
│        │ │        │ │        │ │        │   │        │ │Sub     │
│Entra ID│ │Log     │ │Hub VNet│ │ Prod   │   │        │ │        │
│Priv. ID│ │Analytics│ │Firewall│ │ Dev    │   │        │ │Full    │
│        │ │Sentinel│ │VPN/ER  │ │ Test   │   │        │ │Access  │
└────────┘ └────────┘ └────────┘ └────────┘   └────────┘ └────────┘
                           │           │
                           └───────────┴──────────┐
                                                  ▼
                                        ┌──────────────────┐
                                        │  Hub-Spoke VNet  │
                                        │                  │
                                        │ - Azure Firewall │
                                        │ - VPN Gateway    │
                                        │ - Bastion        │
                                        │ - DDoS Protection│
                                        └──────────────────┘
```

## Management Group Hierarchy

### Platform Management Group

Contains shared platform services and centralized operations.

**Identity Subscription:**
- Microsoft Entra ID (Azure AD)
- Entra ID Privileged Identity Management (PIM)
- Entra ID Identity Protection
- Conditional Access policies
- Domain controllers (if hybrid)
- Azure AD Connect (if hybrid)

**Management Subscription:**
- Azure Monitor Log Analytics workspace
- Microsoft Sentinel (SIEM)
- Azure Automation accounts
- Azure Policy compliance dashboard
- Cost Management + Billing
- Azure Backup vaults

**Connectivity Subscription:**
- Hub virtual network
- Azure Firewall / NVA
- VPN Gateway / ExpressRoute
- Azure Bastion
- Network Watcher
- DDoS Protection Plan
- Private DNS zones

**Policies Applied:**

```json
{
  "properties": {
    "displayName": "Platform Security Baseline",
    "policyType": "Custom",
    "mode": "All",
    "parameters": {},
    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "type",
            "equals": "Microsoft.Network/networkSecurityGroups"
          },
          {
            "count": {
              "field": "Microsoft.Network/networkSecurityGroups/securityRules[*]",
              "where": {
                "allOf": [
                  {
                    "field": "Microsoft.Network/networkSecurityGroups/securityRules[*].access",
                    "equals": "Allow"
                  },
                  {
                    "field": "Microsoft.Network/networkSecurityGroups/securityRules[*].direction",
                    "equals": "Inbound"
                  },
                  {
                    "field": "Microsoft.Network/networkSecurityGroups/securityRules[*].sourceAddressPrefix",
                    "in": ["*", "Internet", "0.0.0.0/0"]
                  }
                ]
              }
            },
            "greater": 0
          }
        ]
      },
      "then": {
        "effect": "deny"
      }
    }
  }
}
```

### Landing Zones Management Group

Contains application workloads organized by environment and compliance requirements.

**Corporate Landing Zone:**
- Production workloads
- On-premises connectivity required
- Spoke VNet peered to hub
- Managed identities enforced
- Private endpoints mandatory

**Online Landing Zone:**
- Internet-facing applications
- Public endpoints allowed
- Web Application Firewall required
- DDoS protection enabled
- Enhanced monitoring

**Policies Applied:**

```json
{
  "properties": {
    "displayName": "Require Private Endpoints for Storage",
    "policyType": "Custom",
    "mode": "Indexed",
    "parameters": {},
    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "type",
            "equals": "Microsoft.Storage/storageAccounts"
          },
          {
            "field": "Microsoft.Storage/storageAccounts/networkAcls.defaultAction",
            "notEquals": "Deny"
          }
        ]
      },
      "then": {
        "effect": "deny"
      }
    }
  }
}
```

### Decommissioned Management Group

Contains subscriptions being sunset or quarantined.

**Quarantine Subscription:**
- Compromised resource isolation
- All network access blocked
- Forensics tooling only
- Read-only access for security team

**Policies Applied:**

```json
{
  "properties": {
    "displayName": "Deny All Resource Creation",
    "policyType": "Custom",
    "mode": "All",
    "parameters": {},
    "policyRule": {
      "if": {
        "field": "type",
        "notIn": [
          "Microsoft.Security/assessments",
          "Microsoft.Security/complianceResults"
        ]
      },
      "then": {
        "effect": "deny"
      }
    }
  }
}
```

### Sandbox Management Group

Provides innovation space with relaxed policies.

**Sandbox Subscriptions:**
- Full service access
- Cost limits enforced via budgets
- Automatic resource cleanup (30 days)
- No production data allowed
- No connectivity to corporate network

**Policies Applied:**

```json
{
  "properties": {
    "displayName": "Enforce Budget Limits",
    "policyType": "Custom",
    "mode": "All",
    "parameters": {
      "budgetAmount": {
        "type": "Integer",
        "metadata": {
          "displayName": "Monthly Budget",
          "description": "Maximum monthly spend in USD"
        },
        "defaultValue": 1000
      }
    },
    "policyRule": {
      "if": {
        "field": "type",
        "equals": "Microsoft.Resources/subscriptions"
      },
      "then": {
        "effect": "deployIfNotExists",
        "details": {
          "type": "Microsoft.Consumption/budgets",
          "roleDefinitionIds": [
            "/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
          ],
          "deployment": {
            "properties": {
              "mode": "incremental",
              "template": {
                "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
                "contentVersion": "1.0.0.0",
                "resources": [
                  {
                    "type": "Microsoft.Consumption/budgets",
                    "apiVersion": "2021-10-01",
                    "name": "SandboxBudget",
                    "properties": {
                      "category": "Cost",
                      "amount": "[parameters('budgetAmount')]",
                      "timeGrain": "Monthly",
                      "timePeriod": {
                        "startDate": "[concat(utcNow('yyyy-MM'), '-01')]"
                      },
                      "notifications": {
                        "Actual_80_Percent": {
                          "enabled": true,
                          "operator": "GreaterThan",
                          "threshold": 80,
                          "contactEmails": ["[email protected]"]
                        }
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}
```

## Azure Policy Initiatives

### Security Baseline Initiative

Comprehensive security controls applied at tenant root:

```json
{
  "properties": {
    "displayName": "Contoso Security Baseline",
    "policyType": "Custom",
    "description": "Enforce organization-wide security controls",
    "metadata": {
      "category": "Security"
    },
    "parameters": {},
    "policyDefinitions": [
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/404c3081-a854-4457-ae30-26a93ef643f9",
        "parameters": {}
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/a1181c5f-672a-477a-979a-7d58aa086233",
        "parameters": {}
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/013e242c-8828-4970-87b3-ab247555486d",
        "parameters": {}
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/7d7be79c-23ba-4033-84dd-45e2a5ccdd67",
        "parameters": {}
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/0961003e-5a0a-4549-abde-af6a37f2724d",
        "parameters": {}
      }
    ]
  }
}
```

**Key Policies Included:**

1. **Secure transfer to storage accounts should be enabled** (404c3081)
2. **Audit VMs without managed disks** (a1181c5f)
3. **Deploy Diagnostic Settings for Network Security Groups** (013e242c)
4. **Function apps should only be accessible over HTTPS** (7d7be79c)
5. **Require a tag and its value on resources** (0961003e)

### Compliance Initiative (CIS Benchmark)

```json
{
  "properties": {
    "displayName": "CIS Microsoft Azure Foundations Benchmark v1.4.0",
    "policyType": "BuiltIn",
    "policyDefinitionId": "/providers/Microsoft.Authorization/policySetDefinitions/c3f5c4d9-9a1d-4a99-85c0-7f93e384d5c5",
    "parameters": {
      "effect": {
        "value": "Audit"
      }
    }
  }
}
```

### Custom Network Security Initiative

```json
{
  "properties": {
    "displayName": "Network Security Controls",
    "policyType": "Custom",
    "policyDefinitions": [
      {
        "policyDefinitionId": "/subscriptions/xxx/providers/Microsoft.Authorization/policyDefinitions/deny-public-ip",
        "parameters": {
          "effect": {
            "value": "Deny"
          }
        }
      },
      {
        "policyDefinitionId": "/subscriptions/xxx/providers/Microsoft.Authorization/policyDefinitions/require-nsg-on-subnet",
        "parameters": {
          "effect": {
            "value": "Audit"
          }
        }
      },
      {
        "policyDefinitionId": "/subscriptions/xxx/providers/Microsoft.Authorization/policyDefinitions/allowed-nsg-rules-only",
        "parameters": {
          "allowedPorts": {
            "value": [443, 22]
          }
        }
      }
    ]
  }
}
```

## Hub-Spoke Network Security

### Hub VNet Design

```
┌─────────────────────────────────────────────────────────────┐
│              Hub VNet (10.0.0.0/16)                         │
│                                                              │
│  ┌──────────────────┐  ┌──────────────────┐                │
│  │ GatewaySubnet    │  │ AzureFirewall    │                │
│  │ 10.0.0.0/24      │  │ Subnet           │                │
│  │                  │  │ 10.0.1.0/26      │                │
│  │ - VPN Gateway    │  │                  │                │
│  │ - ExpressRoute   │  │ - Firewall       │                │
│  └──────────────────┘  │ - Public IP      │                │
│                        └──────────────────┘                │
│                                                              │
│  ┌──────────────────┐  ┌──────────────────┐                │
│  │ AzureBastion     │  │ Management       │                │
│  │ Subnet           │  │ Subnet           │                │
│  │ 10.0.2.0/27      │  │ 10.0.3.0/24      │                │
│  │                  │  │                  │                │
│  │ - Bastion Host   │  │ - Jump Boxes     │                │
│  └──────────────────┘  │ - Monitoring VMs │                │
│                        └──────────────────┘                │
└─────────────────────────────────────────────────────────────┘
                          │
        ┌─────────────────┴─────────────────┐
        │                                   │
        ▼                                   ▼
┌──────────────────┐              ┌──────────────────┐
│ Spoke VNet 1     │              │ Spoke VNet 2     │
│ (Production)     │              │ (Development)    │
│ 10.1.0.0/16      │              │ 10.2.0.0/16      │
│                  │              │                  │
│ - App Subnet     │              │ - App Subnet     │
│ - Data Subnet    │              │ - Data Subnet    │
│ - Private Endpts │              │ - Private Endpts │
└──────────────────┘              └──────────────────┘
```

### Azure Firewall Configuration

**Network Rules:**

```json
{
  "properties": {
    "ruleCollections": [
      {
        "name": "AllowOutboundHTTPS",
        "priority": 100,
        "action": {
          "type": "Allow"
        },
        "rules": [
          {
            "name": "AllowHTTPS",
            "protocols": ["TCP"],
            "sourceAddresses": ["10.1.0.0/16", "10.2.0.0/16"],
            "destinationAddresses": ["*"],
            "destinationPorts": ["443"]
          }
        ]
      },
      {
        "name": "AllowDNS",
        "priority": 110,
        "action": {
          "type": "Allow"
        },
        "rules": [
          {
            "name": "AllowDNSQueries",
            "protocols": ["UDP"],
            "sourceAddresses": ["10.1.0.0/16", "10.2.0.0/16"],
            "destinationAddresses": ["168.63.129.16"],
            "destinationPorts": ["53"]
          }
        ]
      }
    ]
  }
}
```

**Application Rules:**

```json
{
  "properties": {
    "ruleCollections": [
      {
        "name": "AllowAzureServices",
        "priority": 200,
        "action": {
          "type": "Allow"
        },
        "rules": [
          {
            "name": "AllowAzureMonitor",
            "protocols": [
              {
                "protocolType": "Https",
                "port": 443
              }
            ],
            "targetFqdns": [
              "*.ods.opinsights.azure.com",
              "*.oms.opinsights.azure.com",
              "*.monitoring.azure.com"
            ],
            "sourceAddresses": ["10.1.0.0/16", "10.2.0.0/16"]
          },
          {
            "name": "AllowWindowsUpdate",
            "protocols": [
              {
                "protocolType": "Http",
                "port": 80
              },
              {
                "protocolType": "Https",
                "port": 443
              }
            ],
            "targetFqdns": [
              "*.windowsupdate.microsoft.com",
              "*.update.microsoft.com"
            ],
            "sourceAddresses": ["10.1.0.0/16"]
          }
        ]
      }
    ]
  }
}
```

**Threat Intelligence:**

```json
{
  "properties": {
    "threatIntelMode": "Alert",
    "threatIntelWhitelist": {
      "fqdns": ["trusted-partner.com"],
      "ipAddresses": ["20.30.40.50"]
    }
  }
}
```

### VNet Peering Security

```json
{
  "properties": {
    "allowVirtualNetworkAccess": true,
    "allowForwardedTraffic": true,
    "allowGatewayTransit": true,
    "useRemoteGateways": false,
    "remoteVirtualNetwork": {
      "id": "/subscriptions/xxx/resourceGroups/hub-network-rg/providers/Microsoft.Network/virtualNetworks/hub-vnet"
    }
  }
}
```

## Microsoft Defender for Cloud Integration

### Enable All Defender Plans

```powershell
# Enable Defender for Cloud Standard tier
Set-AzSecurityPricing -Name "VirtualMachines" -PricingTier "Standard"
Set-AzSecurityPricing -Name "SqlServers" -PricingTier "Standard"
Set-AzSecurityPricing -Name "AppServices" -PricingTier "Standard"
Set-AzSecurityPricing -Name "StorageAccounts" -PricingTier "Standard"
Set-AzSecurityPricing -Name "SqlServerVirtualMachines" -PricingTier "Standard"
Set-AzSecurityPricing -Name "KubernetesService" -PricingTier "Standard"
Set-AzSecurityPricing -Name "ContainerRegistry" -PricingTier "Standard"
Set-AzSecurityPricing -Name "KeyVaults" -PricingTier "Standard"
Set-AzSecurityPricing -Name "Dns" -PricingTier "Standard"
Set-AzSecurityPricing -Name "Arm" -PricingTier "Standard"
Set-AzSecurityPricing -Name "OpenSourceRelationalDatabases" -PricingTier "Standard"
Set-AzSecurityPricing -Name "Containers" -PricingTier "Standard"

# Configure auto-provisioning
Set-AzSecurityAutoProvisioningSetting -Name "default" -EnableAutoProvision
```

### Defender for Servers Configuration

```json
{
  "properties": {
    "pricingTier": "Standard",
    "subPlan": "P2",
    "extensions": [
      {
        "name": "MDE",
        "isEnabled": "True"
      },
      {
        "name": "AgentlessVmScanning",
        "isEnabled": "True"
      },
      {
        "name": "FileSensitivity",
        "isEnabled": "True"
      }
    ]
  }
}
```

### Security Alerts Automation

```json
{
  "type": "Microsoft.Security/automations",
  "apiVersion": "2019-01-01-preview",
  "name": "HighSeverityAlertAutomation",
  "location": "eastus2",
  "properties": {
    "description": "Trigger incident response for high severity alerts",
    "isEnabled": true,
    "scopes": [
      {
        "description": "All subscriptions",
        "scopePath": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
      }
    ],
    "sources": [
      {
        "eventSource": "Alerts",
        "ruleSets": [
          {
            "rules": [
              {
                "propertyJPath": "properties.metadata.severity",
                "propertyType": "String",
                "expectedValue": "High",
                "operator": "Equals"
              }
            ]
          }
        ]
      }
    ],
    "actions": [
      {
        "actionType": "LogicApp",
        "logicAppResourceId": "/subscriptions/xxx/resourceGroups/security-automation/providers/Microsoft.Logic/workflows/IncidentResponseWorkflow",
        "uri": "https://prod-xx.eastus2.logic.azure.com:443/workflows/xxx/triggers/manual/paths/invoke"
      }
    ]
  }
}
```

## Microsoft Entra ID Configuration

### Conditional Access Policies

**Require MFA for All Users:**

```json
{
  "displayName": "Require MFA for all users",
  "state": "enabled",
  "conditions": {
    "users": {
      "includeUsers": ["All"],
      "excludeGroups": ["BreakGlassAccounts"]
    },
    "applications": {
      "includeApplications": ["All"]
    },
    "locations": {
      "includeLocations": ["All"]
    }
  },
  "grantControls": {
    "operator": "OR",
    "builtInControls": ["mfa"]
  }
}
```

**Block Legacy Authentication:**

```json
{
  "displayName": "Block legacy authentication",
  "state": "enabled",
  "conditions": {
    "users": {
      "includeUsers": ["All"],
      "excludeGroups": ["LegacyAppExceptions"]
    },
    "applications": {
      "includeApplications": ["All"]
    },
    "clientAppTypes": [
      "exchangeActiveSync",
      "other"
    ]
  },
  "grantControls": {
    "operator": "OR",
    "builtInControls": ["block"]
  }
}
```

**Require Compliant Device for Admins:**

```json
{
  "displayName": "Require compliant device for admins",
  "state": "enabled",
  "conditions": {
    "users": {
      "includeRoles": [
        "62e90394-69f5-4237-9190-012177145e10",
        "194ae4cb-b126-40b2-bd5b-6091b380977d"
      ]
    },
    "applications": {
      "includeApplications": ["All"]
    }
  },
  "grantControls": {
    "operator": "OR",
    "builtInControls": ["compliantDevice", "domainJoinedDevice"]
  }
}
```

### Privileged Identity Management (PIM)

**Role Assignment:**

```json
{
  "properties": {
    "roleDefinitionId": "/subscriptions/xxx/providers/Microsoft.Authorization/roleDefinitions/8e3af657-a8ff-443c-a75c-2fe8c4bcb635",
    "principalId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "requestType": "AdminAssign",
    "scheduleInfo": {
      "startDateTime": "2024-01-01T00:00:00Z",
      "expiration": {
        "type": "AfterDuration",
        "duration": "PT8H"
      }
    },
    "condition": "@Resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEquals 'production-data'",
    "conditionVersion": "2.0"
  }
}
```

**PIM Settings:**

```json
{
  "properties": {
    "userMemberSettings": {
      "permanentEligibleSettings": {
        "approvalRequired": false
      },
      "expiringEligibleSettings": {
        "maximumGrantPeriod": "P365D"
      },
      "permanentActiveSettings": {
        "approvalRequired": true,
        "approvers": [
          {
            "id": "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.ManagedIdentity/userAssignedIdentities/pim-approver"
          }
        ]
      },
      "activationSettings": {
        "maximumGrantPeriod": "PT8H",
        "approvalRequired": true,
        "requireMFA": true,
        "requireJustification": true,
        "requireTicketInfo": true
      }
    }
  }
}
```

## Centralized Logging with Microsoft Sentinel

### Log Analytics Workspace Design

```powershell
# Create Log Analytics workspace
New-AzOperationalInsightsWorkspace `
  -ResourceGroupName "security-logging-rg" `
  -Name "contoso-sentinel-workspace" `
  -Location "eastus2" `
  -Sku "PerGB2018" `
  -RetentionInDays 90

# Enable Sentinel
Set-AzSentinelOnboardingState `
  -ResourceGroupName "security-logging-rg" `
  -WorkspaceName "contoso-sentinel-workspace" `
  -CustomerManagedKey $false
```

### Data Connectors

```json
{
  "kind": "AzureActiveDirectory",
  "properties": {
    "tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "dataTypes": {
      "alerts": {
        "state": "enabled"
      }
    }
  }
}
```

```json
{
  "kind": "AzureSecurityCenter",
  "properties": {
    "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "dataTypes": {
      "alerts": {
        "state": "enabled"
      }
    }
  }
}
```

### Analytics Rules

**Suspicious Sign-In Activity:**

```kql
SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType != "0"
| summarize
    FailureCount = count(),
    DistinctIPCount = dcount(IPAddress),
    FirstFailure = min(TimeGenerated),
    LastFailure = max(TimeGenerated)
    by UserPrincipalName, AppDisplayName
| where FailureCount > 10 or DistinctIPCount > 5
| extend
    Severity = iff(FailureCount > 50, "High", "Medium"),
    Description = strcat("User ", UserPrincipalName, " had ", FailureCount, " failed sign-ins")
```

**Anomalous Resource Creation:**

```kql
AzureActivity
| where TimeGenerated > ago(1h)
| where OperationNameValue endswith "write"
| where ActivityStatusValue == "Success"
| summarize
    ResourceCount = count(),
    ResourceTypes = make_set(ResourceType)
    by Caller, CallerIpAddress
| where ResourceCount > 20
| extend
    Severity = "Medium",
    Description = strcat(Caller, " created ", ResourceCount, " resources from ", CallerIpAddress)
```

### Playbooks (Logic Apps)

**Isolation Playbook:**

```json
{
  "definition": {
    "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
    "actions": {
      "Parse_Alert": {
        "type": "ParseJson",
        "inputs": {
          "content": "@triggerBody()?['Entities']",
          "schema": {
            "type": "object",
            "properties": {
              "ResourceId": {"type": "string"}
            }
          }
        }
      },
      "Get_VM_Details": {
        "type": "ApiConnection",
        "inputs": {
          "host": {
            "connection": {
              "name": "@parameters('$connections')['azurevm']['connectionId']"
            }
          },
          "method": "get",
          "path": "/subscriptions/@{encodeURIComponent(variables('subscriptionId'))}/resourceGroups/@{encodeURIComponent(variables('resourceGroup'))}/providers/Microsoft.Compute/virtualMachines/@{encodeURIComponent(variables('vmName'))}"
        },
        "runAfter": {
          "Parse_Alert": ["Succeeded"]
        }
      },
      "Apply_Quarantine_NSG": {
        "type": "ApiConnection",
        "inputs": {
          "host": {
            "connection": {
              "name": "@parameters('$connections')['azurenetworksecuritygroups']['connectionId']"
            }
          },
          "method": "put",
          "path": "/subscriptions/@{encodeURIComponent(variables('subscriptionId'))}/resourceGroups/@{encodeURIComponent(variables('resourceGroup'))}/providers/Microsoft.Network/networkInterfaces/@{encodeURIComponent(variables('nicName'))}",
          "body": {
            "properties": {
              "networkSecurityGroup": {
                "id": "/subscriptions/xxx/resourceGroups/security-rg/providers/Microsoft.Network/networkSecurityGroups/quarantine-nsg"
              }
            }
          }
        },
        "runAfter": {
          "Get_VM_Details": ["Succeeded"]
        }
      },
      "Create_Incident": {
        "type": "ApiConnection",
        "inputs": {
          "host": {
            "connection": {
              "name": "@parameters('$connections')['azuresentinel']['connectionId']"
            }
          },
          "method": "put",
          "path": "/Incidents",
          "body": {
            "properties": {
              "title": "VM Isolated - @{variables('vmName')}",
              "severity": "High",
              "status": "New"
            }
          }
        },
        "runAfter": {
          "Apply_Quarantine_NSG": ["Succeeded"]
        }
      }
    }
  }
}
```

## Diagnostic Settings

### Subscription-Level Diagnostics

```json
{
  "properties": {
    "workspaceId": "/subscriptions/xxx/resourceGroups/security-logging-rg/providers/Microsoft.OperationalInsights/workspaces/contoso-sentinel-workspace",
    "logs": [
      {
        "category": "Administrative",
        "enabled": true
      },
      {
        "category": "Security",
        "enabled": true
      },
      {
        "category": "Alert",
        "enabled": true
      },
      {
        "category": "Policy",
        "enabled": true
      }
    ]
  }
}
```

### Resource-Level Diagnostics (Azure Policy)

```json
{
  "properties": {
    "displayName": "Deploy Diagnostic Settings for Storage Accounts",
    "policyType": "Custom",
    "mode": "Indexed",
    "parameters": {
      "workspaceId": {
        "type": "String",
        "metadata": {
          "displayName": "Log Analytics workspace"
        }
      }
    },
    "policyRule": {
      "if": {
        "field": "type",
        "equals": "Microsoft.Storage/storageAccounts"
      },
      "then": {
        "effect": "deployIfNotExists",
        "details": {
          "type": "Microsoft.Insights/diagnosticSettings",
          "existenceCondition": {
            "allOf": [
              {
                "field": "Microsoft.Insights/diagnosticSettings/logs[*].enabled",
                "equals": "true"
              }
            ]
          },
          "roleDefinitionIds": [
            "/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
          ],
          "deployment": {
            "properties": {
              "mode": "incremental",
              "template": {
                "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
                "contentVersion": "1.0.0.0",
                "parameters": {
                  "resourceName": {
                    "type": "string"
                  },
                  "workspaceId": {
                    "type": "string"
                  }
                },
                "resources": [
                  {
                    "type": "Microsoft.Storage/storageAccounts/providers/diagnosticSettings",
                    "apiVersion": "2021-05-01-preview",
                    "name": "[concat(parameters('resourceName'), '/Microsoft.Insights/default')]",
                    "properties": {
                      "workspaceId": "[parameters('workspaceId')]",
                      "metrics": [
                        {
                          "category": "Transaction",
                          "enabled": true
                        }
                      ]
                    }
                  }
                ]
              },
              "parameters": {
                "resourceName": {
                  "value": "[field('name')]"
                },
                "workspaceId": {
                  "value": "[parameters('workspaceId')]"
                }
              }
            }
          }
        }
      }
    }
  }
}
```

## Key Security Metrics

Monitor these metrics across the Azure environment:

1. **Identity Metrics:**
   - Failed sign-in attempts per user
   - MFA coverage percentage
   - Privileged role activations
   - Conditional Access policy effectiveness

2. **Compliance Metrics:**
   - Azure Policy compliance rate
   - Defender for Cloud secure score
   - Non-compliant resources count
   - Security recommendations by severity

3. **Network Metrics:**
   - Azure Firewall threat intel hits
   - DDoS attack attempts
   - NSG rule violations
   - Private endpoint coverage

4. **Detection Metrics:**
   - Sentinel alert volume by severity
   - Mean time to detect (MTTD)
   - Mean time to respond (MTTR)
   - False positive rate

## Implementation Checklist

- [ ] Design management group hierarchy
- [ ] Create platform subscriptions (Identity, Management, Connectivity)
- [ ] Deploy hub virtual network
- [ ] Configure Azure Firewall
- [ ] Create Azure Policy initiatives
- [ ] Assign policies at appropriate scopes
- [ ] Configure Microsoft Entra ID (Azure AD)
- [ ] Deploy Conditional Access policies
- [ ] Enable Privileged Identity Management
- [ ] Deploy Log Analytics workspace
- [ ] Enable Microsoft Sentinel
- [ ] Configure diagnostic settings (subscription and resource)
- [ ] Enable Defender for Cloud on all subscriptions
- [ ] Create landing zone subscriptions
- [ ] Deploy spoke virtual networks
- [ ] Configure VNet peering
- [ ] Create security automation playbooks
- [ ] Establish security monitoring dashboard
- [ ] Configure alert notifications
- [ ] Document deployment procedures
- [ ] Train operations teams

## References

- [Azure Landing Zone Architecture](https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/)
- [Azure Security Benchmark](https://learn.microsoft.com/security/benchmark/azure/)
- [Microsoft Entra ID Best Practices](https://learn.microsoft.com/entra/identity/fundamentals/security-operations-introduction)
- [Microsoft Sentinel Best Practices](https://learn.microsoft.com/azure/sentinel/best-practices)
- [CIS Microsoft Azure Foundations Benchmark](https://www.cisecurity.org/benchmark/azure)

```

### references/iam-patterns.md

```markdown
# Identity & Access Management Patterns Reference

## Authentication Controls

### Multi-Factor Authentication (MFA)

**Types:**
- TOTP (Time-based One-Time Password): Google Authenticator, Authy
- Push Notifications: Duo, Okta Verify
- Biometrics: Fingerprint, Face ID
- Hardware Tokens: YubiKey, FIDO2

**Implementation:**
- Enforce MFA for all users (prioritize privileged accounts)
- Support multiple MFA methods (user choice)
- Backup codes for account recovery
- Risk-based MFA (adaptive authentication)

### Single Sign-On (SSO)

**Protocols:**
- SAML 2.0: Enterprise federation
- OAuth 2.0: API authorization
- OpenID Connect (OIDC): Authentication layer on OAuth 2.0

**Benefits:**
- Centralized authentication
- Reduced password fatigue
- Improved security posture
- Better user experience

## Authorization Controls

### Role-Based Access Control (RBAC)

**Structure:**
- Users → Roles → Permissions
- Roles represent job functions (admin, developer, analyst)
- Coarse-grained, simple to implement

**Best For:** Organizations with stable role structures

### Attribute-Based Access Control (ABAC)

**Structure:**
- Fine-grained access based on attributes
- User attributes (department, clearance level)
- Resource attributes (classification, owner)
- Environmental attributes (time, location)

**Best For:** Complex, dynamic access requirements

### Policy-Based Access Control (PBAC)

**Centralized Policy Engines:**
- Open Policy Agent (OPA)
- AWS Cedar
- Authzed (SpiceDB)

**Best For:** Microservices, API gateways, cloud-native architectures

## Privileged Access Management (PAM)

### Just-in-Time (JIT) Access

**Principle:** Temporary elevated privileges for specific tasks

**Implementation:**
- Request-based access (approval workflow)
- Time-bound grants (4-8 hours)
- Automated de-provisioning
- Audit all privilege activations

### Credential Vaulting

**Solutions:**
- CyberArk
- HashiCorp Vault
- AWS Secrets Manager
- Azure Key Vault

**Features:**
- Centralized credential storage
- Automatic password rotation
- Session recording
- Break-glass procedures

```

### references/security-operations.md

```markdown
# Security Operations Reference

## SIEM (Security Information & Event Management)

**Purpose:** Centralize log aggregation, correlation, and alerting

**Leading Platforms:**
- Splunk Enterprise Security
- Elastic Security
- Microsoft Sentinel
- Google Chronicle

**Architecture:**
1. Log Collection: Ingest from all sources
2. Normalization: Standardize log formats
3. Correlation: Apply rules to detect patterns
4. Alerting: Notify SOC team
5. Investigation: Search and visualization

## SOAR (Security Orchestration, Automation & Response)

**Purpose:** Automate incident response workflows

**Capabilities:**
- Playbooks: Automated response workflows
- Orchestration: Integrate security tools
- Case Management: Track incidents

**Leading Platforms:**
- Splunk SOAR
- Palo Alto Cortex XSOAR
- IBM Resilient

## Detection Strategies

### UEBA (User & Entity Behavior Analytics)

**Purpose:** ML-based anomaly detection

**Use Cases:**
- Account compromise detection
- Insider threat detection
- Data exfiltration detection
- Lateral movement detection

### Threat Intelligence

**Sources:**
- MISP (Malware Information Sharing Platform)
- ThreatConnect
- ISACs (Information Sharing and Analysis Centers)

**Integration:**
- Enrich SIEM alerts with threat context
- Block known malicious IPs/domains
- Proactive threat hunting

```

### examples/threat-models/web-app-stride.md

```markdown
# Web Application STRIDE Threat Model Example

## System Overview

**Application:** E-commerce Web Application
**Architecture:** Three-tier (Web → Application → Database)
**Technology Stack:**
- Frontend: React SPA
- API: Node.js/Express REST API
- Database: PostgreSQL
- Authentication: OAuth 2.0 + JWT
- Hosting: AWS (CloudFront, ALB, EC2, RDS)

## Data Flow Diagram

```
┌─────────┐                                          ┌──────────┐
│  User   │─────(1) HTTPS Request───────────────────►│CloudFront│
│(Browser)│◄────(10) HTTPS Response──────────────────│   CDN    │
└─────────┘                                          └────┬─────┘
                                                          │
                                                     (2) Forward
                                                          │
                                                     ┌────▼─────┐
                                                     │   WAF    │
                                                     │ (AWS WAF)│
                                                     └────┬─────┘
                                                          │
                                                     (3) Inspect
                                                          │
                                                     ┌────▼─────┐
                                                     │   ALB    │
                                                     │(Load Bal)│
                                                     └────┬─────┘
                                                          │
                                         ┌────────────────┼────────────────┐
                                         │                │                │
                                    ┌────▼────┐     ┌────▼────┐     ┌────▼────┐
                                    │Web App 1│     │Web App 2│     │Web App 3│
                                    │(Express)│     │(Express)│     │(Express)│
                                    └────┬────┘     └────┬────┘     └────┬────┘
                                         │                │                │
                                         └────────────────┼────────────────┘
                                                          │
                                                     (4) SQL Query
                                                          │
                                                     ┌────▼─────┐
                                                     │PostgreSQL│
                                                     │   RDS    │
                                                     └──────────┘

Trust Boundaries:
───────────────── Internet / AWS (1-2)
───────────────── WAF / Application (3)
───────────────── Application / Database (4)
```

## STRIDE Threat Analysis

### Component 1: User (Browser)

#### Threat: Spoofing

**S1.1: Phishing Attack**
- **Description:** Attacker creates fake login page to steal credentials
- **Impact:** HIGH - User credentials compromised
- **Likelihood:** MEDIUM - Common attack vector
- **Mitigation:**
  - Implement HTTPS with EV certificate (visible in browser)
  - User education and security awareness training
  - FIDO2/WebAuthn for phishing-resistant authentication
  - Email warnings for suspicious login attempts

---

### Component 2: CloudFront (CDN)

#### Threat: Tampering

**T2.1: Man-in-the-Middle (MITM) Attack**
- **Description:** Attacker intercepts traffic between user and CDN
- **Impact:** HIGH - Data theft, session hijacking
- **Likelihood:** LOW - HTTPS prevents MITM
- **Mitigation:**
  - Enforce TLS 1.3 minimum
  - HSTS (HTTP Strict Transport Security) headers
  - Certificate pinning for mobile apps

#### Threat: Denial of Service

**D2.1: DDoS Attack on CDN**
- **Description:** Volumetric DDoS attack overwhelms CDN
- **Impact:** HIGH - Service unavailability
- **Likelihood:** MEDIUM - E-commerce targets for extortion
- **Mitigation:**
  - CloudFront DDoS protection (built-in)
  - AWS Shield Standard (free) or Advanced
  - Rate limiting at CDN edge

---

### Component 3: WAF (AWS WAF)

#### Threat: Elevation of Privilege

**E3.1: WAF Bypass**
- **Description:** Attacker bypasses WAF rules to reach application
- **Impact:** HIGH - Exposes application to attacks
- **Likelihood:** MEDIUM - WAF rules can be bypassed
- **Mitigation:**
  - Regularly update WAF rules (OWASP Core Rule Set)
  - Custom rules for application-specific attacks
  - Rate limiting and geographic restrictions
  - Monitor WAF logs for bypass attempts

---

### Component 4: Application Load Balancer (ALB)

#### Threat: Denial of Service

**D4.1: HTTP Flood Attack**
- **Description:** Application-layer DDoS targeting ALB
- **Impact:** HIGH - Service degradation
- **Likelihood:** MEDIUM
- **Mitigation:**
  - ALB connection limits and timeouts
  - Auto-scaling based on load
  - Rate limiting at application layer
  - WAF rate-based rules

---

### Component 5: Web Application (Express API)

#### Threat: Spoofing

**S5.1: JWT Token Forgery**
- **Description:** Attacker forges JWT token to impersonate user
- **Impact:** CRITICAL - Complete account takeover
- **Likelihood:** LOW - Requires key compromise
- **Mitigation:**
  - Strong JWT signing algorithm (RS256, not HS256 with weak secret)
  - Short token expiry (15 minutes)
  - Refresh token rotation
  - Store signing keys in AWS Secrets Manager
  - Validate token signature, expiry, issuer, audience

#### Threat: Tampering

**T5.1: SQL Injection**
- **Description:** Attacker injects SQL code via user input
- **Impact:** CRITICAL - Database compromise, data breach
- **Likelihood:** MEDIUM - Common attack, but mitigable
- **Mitigation:**
  - Use parameterized queries or ORM (Sequelize, TypeORM)
  - Input validation (whitelist, length limits)
  - Least privilege database user (no DROP, CREATE permissions)
  - WAF SQL injection rules
  - Regular SAST/DAST scanning

**T5.2: API Parameter Tampering**
- **Description:** User modifies request parameters to access unauthorized data
- **Impact:** HIGH - Unauthorized data access (IDOR)
- **Likelihood:** MEDIUM
- **Mitigation:**
  - Server-side authorization checks (verify user owns resource)
  - Indirect object references (use UUIDs, not sequential IDs)
  - Input validation on all parameters

#### Threat: Repudiation

**R5.1: User Denies Transaction**
- **Description:** User denies making purchase or action
- **Impact:** MEDIUM - Fraud, chargebacks
- **Likelihood:** MEDIUM
- **Mitigation:**
  - Comprehensive audit logging (user, IP, timestamp, action)
  - Immutable logs (centralized SIEM)
  - Email confirmation for critical actions (purchases, password changes)
  - Transaction receipts and order history

#### Threat: Information Disclosure

**I5.1: Verbose Error Messages**
- **Description:** Error messages reveal internal paths, stack traces
- **Impact:** MEDIUM - Aids attacker reconnaissance
- **Likelihood:** HIGH - Common misconfiguration
- **Mitigation:**
  - Generic error messages to users ("An error occurred")
  - Detailed errors only in server logs
  - Custom error pages (500, 404)
  - Remove stack traces in production

**I5.2: API Responses Leak Sensitive Data**
- **Description:** API returns more data than necessary (full user objects)
- **Impact:** MEDIUM - Exposure of PII, internal data
- **Likelihood:** HIGH
- **Mitigation:**
  - Return only necessary fields (use DTOs)
  - Serialize responses (remove sensitive fields)
  - API response schema validation

**I5.3: Session Token Exposure in Logs**
- **Description:** JWT tokens logged in application logs
- **Impact:** HIGH - Session hijacking if logs compromised
- **Likelihood:** MEDIUM
- **Mitigation:**
  - Redact tokens in logs (mask Authorization headers)
  - Secure log storage (encryption, access control)
  - Short token expiry reduces exposure window

#### Threat: Denial of Service

**D5.1: API Rate Limit Bypass**
- **Description:** Attacker bypasses rate limiting to exhaust resources
- **Impact:** MEDIUM - Service degradation
- **Likelihood:** MEDIUM
- **Mitigation:**
  - Multiple rate limiting layers (IP, user, API key)
  - Distributed rate limiting (Redis)
  - CAPTCHA for suspicious traffic
  - Auto-scaling to handle bursts

**D5.2: Resource Exhaustion (ReDoS)**
- **Description:** Regex Denial of Service via malicious input
- **Impact:** MEDIUM - CPU exhaustion, service unavailability
- **Likelihood:** LOW
- **Mitigation:**
  - Avoid complex regex patterns
  - Timeout limits for regex execution
  - Input length limits
  - Use safe regex libraries (re2)

#### Threat: Elevation of Privilege

**E5.1: Broken Access Control**
- **Description:** User accesses admin endpoints without authorization
- **Impact:** CRITICAL - Complete system compromise
- **Likelihood:** MEDIUM
- **Mitigation:**
  - Authorization checks on every endpoint
  - RBAC (role-based access control)
  - Separate admin API with additional authentication
  - Principle of least privilege

**E5.2: Insecure Direct Object References (IDOR)**
- **Description:** User accesses other users' data by changing ID parameter
- **Impact:** HIGH - Unauthorized data access
- **Likelihood:** HIGH - Very common vulnerability
- **Mitigation:**
  - Verify user owns resource before returning data
  - Use UUIDs instead of sequential IDs
  - Indirect object references (session-based mapping)

---

### Component 6: PostgreSQL Database (RDS)

#### Threat: Tampering

**T6.1: Database Compromise via SQL Injection**
- **Description:** SQL injection leads to data modification/deletion
- **Impact:** CRITICAL - Data integrity loss, data destruction
- **Likelihood:** LOW (if application mitigations applied)
- **Mitigation:**
  - Parameterized queries (primary defense)
  - Database user least privilege (read-only for most operations)
  - Database audit logging
  - Immutable backups

#### Threat: Information Disclosure

**I6.1: Database Backup Exposure**
- **Description:** RDS snapshot publicly accessible or stolen
- **Impact:** CRITICAL - Full database dump exposed
- **Likelihood:** LOW - Requires misconfiguration
- **Mitigation:**
  - Private RDS in VPC (no internet access)
  - Encrypted snapshots (AWS KMS)
  - Access control on snapshots (IAM policies)
  - Regular snapshot access audits

**I6.2: Database Connection String in Code**
- **Description:** Database credentials hardcoded in source code
- **Impact:** CRITICAL - Database full access if code leaked
- **Likelihood:** MEDIUM - Common developer mistake
- **Mitigation:**
  - Store credentials in AWS Secrets Manager
  - IAM database authentication (no passwords)
  - Secrets rotation
  - Code scanning for hardcoded secrets (git-secrets, TruffleHog)

#### Threat: Denial of Service

**D6.1: Database Connection Pool Exhaustion**
- **Description:** Attacker opens many connections, exhausting pool
- **Impact:** HIGH - Database unavailable
- **Likelihood:** MEDIUM
- **Mitigation:**
  - Connection pool limits
  - Connection timeouts
  - Application-level connection management
  - Auto-scaling RDS read replicas

---

## Threat Summary Matrix

| ID | Component | STRIDE | Threat | Risk | Priority |
|----|-----------|--------|--------|------|----------|
| S1.1 | User | Spoofing | Phishing | HIGH | P1 |
| T2.1 | CDN | Tampering | MITM | HIGH | P2 |
| D2.1 | CDN | DoS | DDoS | HIGH | P1 |
| E3.1 | WAF | Elevation | WAF Bypass | HIGH | P2 |
| D4.1 | ALB | DoS | HTTP Flood | HIGH | P2 |
| S5.1 | API | Spoofing | JWT Forgery | CRITICAL | P0 |
| T5.1 | API | Tampering | SQL Injection | CRITICAL | P0 |
| T5.2 | API | Tampering | Param Tampering | HIGH | P1 |
| R5.1 | API | Repudiation | Deny Transaction | MEDIUM | P3 |
| I5.1 | API | Info Disclosure | Verbose Errors | MEDIUM | P3 |
| I5.2 | API | Info Disclosure | API Data Leak | MEDIUM | P2 |
| I5.3 | API | Info Disclosure | Token in Logs | HIGH | P2 |
| D5.1 | API | DoS | Rate Limit Bypass | MEDIUM | P3 |
| D5.2 | API | DoS | ReDoS | MEDIUM | P4 |
| E5.1 | API | Elevation | Broken Access | CRITICAL | P0 |
| E5.2 | API | Elevation | IDOR | HIGH | P1 |
| T6.1 | DB | Tampering | SQLi Compromise | CRITICAL | P0 |
| I6.1 | DB | Info Disclosure | Backup Exposure | CRITICAL | P0 |
| I6.2 | DB | Info Disclosure | Hardcoded Creds | CRITICAL | P0 |
| D6.1 | DB | DoS | Connection Exhaust | HIGH | P2 |

**Priority Levels:**
- P0: Critical - Immediate action required
- P1: High - Address within 1 sprint
- P2: Medium - Address within 1 quarter
- P3: Low - Backlog
- P4: Informational

---

## Mitigation Roadmap

### Sprint 1 (Immediate - P0)
- ☐ Implement parameterized queries for all database operations (T5.1, T6.1)
- ☐ Add authorization checks to all API endpoints (E5.1)
- ☐ Migrate database credentials to AWS Secrets Manager (I6.2)
- ☐ Enable RDS encryption and private VPC placement (I6.1)
- ☐ Strengthen JWT signing (RS256, key rotation) (S5.1)

### Sprint 2-3 (High Priority - P1)
- ☐ Implement FIDO2/WebAuthn for phishing resistance (S1.1)
- ☐ Deploy comprehensive IDOR protection (E5.2)
- ☐ Fix API parameter tampering vulnerabilities (T5.2)
- ☐ Enable AWS Shield and DDoS protection (D2.1)

### Quarter (Medium Priority - P2)
- ☐ Harden WAF rules and monitor bypass attempts (E3.1)
- ☐ Implement multi-layer rate limiting (D4.1, D5.1)
- ☐ Minimize API response data (I5.2)
- ☐ Redact tokens from logs (I5.3)
- ☐ TLS 1.3 enforcement and HSTS (T2.1)

### Backlog (Low Priority - P3-P4)
- ☐ Comprehensive audit logging (R5.1)
- ☐ Custom error pages (I5.1)
- ☐ ReDoS prevention (D5.2)

---

## Validation

**Testing:**
- ☐ SAST: SonarQube scan for SQL injection, hardcoded secrets
- ☐ DAST: OWASP ZAP scan for IDOR, broken access control, XSS
- ☐ Penetration Testing: Third-party security audit
- ☐ Dependency Scanning: Snyk for vulnerable libraries

**Monitoring:**
- ☐ WAF logs for attack patterns
- ☐ API logs for failed authorization attempts
- ☐ SIEM alerts for anomalous behavior (UEBA)
- ☐ Database audit logs for suspicious queries

**Compliance:**
- ☐ Map mitigations to OWASP Top 10
- ☐ Map mitigations to PCI DSS requirements (if applicable)
- ☐ Document security controls for SOC 2 audit

---

## Conclusion

This STRIDE analysis identified 20 threats across 6 components, with 6 critical (P0) threats requiring immediate remediation. Primary focus areas:

1. **SQL Injection Prevention:** Parameterized queries, input validation
2. **Access Control:** Authorization checks, IDOR prevention
3. **Secrets Management:** Migrate to AWS Secrets Manager
4. **Data Protection:** RDS encryption, private VPC placement
5. **Authentication:** Strengthen JWT, implement phishing-resistant auth

Regular threat model updates recommended after significant architecture changes or security incidents.

```

### examples/threat-models/api-threat-model.md

```markdown
# API Threat Model

## Overview

Threat model for RESTful API using STRIDE methodology. Identify threats across authentication, authorization, data validation, and API-specific attack vectors. Apply defense-in-depth controls to mitigate risks.

## System Description

**API Architecture:**

```
┌──────────────┐         ┌──────────────┐         ┌──────────────┐
│   Client     │────────▶│  API Gateway │────────▶│   Backend    │
│  (Browser/   │         │              │         │   Service    │
│   Mobile)    │         │ - Auth       │         │              │
└──────────────┘         │ - Rate Limit │         └──────┬───────┘
                         │ - Validation │                │
                         │ - Logging    │                │
                         └──────────────┘                │
                                                          ▼
                                                  ┌──────────────┐
                                                  │   Database   │
                                                  │              │
                                                  └──────────────┘
```

**API Endpoints:**

- `POST /api/v1/auth/login` - User authentication
- `POST /api/v1/auth/refresh` - Token refresh
- `GET /api/v1/users/{id}` - Retrieve user profile
- `PUT /api/v1/users/{id}` - Update user profile
- `DELETE /api/v1/users/{id}` - Delete user account
- `GET /api/v1/resources` - List resources (paginated)
- `POST /api/v1/resources` - Create resource
- `GET /api/v1/resources/{id}` - Retrieve specific resource
- `PUT /api/v1/resources/{id}` - Update resource
- `DELETE /api/v1/resources/{id}` - Delete resource
- `POST /api/v1/resources/{id}/share` - Share resource with other users

## STRIDE Threat Analysis

### Spoofing (Identity)

**Threat 1.1: Credential Theft**

- **Description:** Attacker obtains user credentials through phishing, keylogging, or database breach
- **Attack Vector:** Stolen username/password used to authenticate to API
- **Impact:** HIGH - Unauthorized access to user account and data
- **Likelihood:** MEDIUM - Common attack method

**Mitigations:**
```yaml
- control: Multi-factor authentication (MFA)
  effectiveness: HIGH
  implementation:
    - Require MFA for all sensitive operations
    - Use time-based one-time passwords (TOTP)
    - Support WebAuthn/FIDO2 hardware keys

- control: Password complexity requirements
  effectiveness: MEDIUM
  implementation:
    - Minimum 12 characters
    - Mix of uppercase, lowercase, numbers, symbols
    - Check against known breached passwords (HaveIBeenPwned)

- control: Account lockout after failed attempts
  effectiveness: MEDIUM
  implementation:
    - Lock account after 5 failed login attempts
    - Exponential backoff (5min, 15min, 30min)
    - CAPTCHA after 3 failed attempts

- control: Monitor for credential stuffing
  effectiveness: MEDIUM
  implementation:
    - Track login attempts from same IP
    - Detect patterns of credential testing
    - Block suspicious IPs automatically
```

**Threat 1.2: Token Theft**

- **Description:** Attacker steals JWT access token or refresh token
- **Attack Vector:** XSS, insecure storage, man-in-the-middle, token logging
- **Impact:** HIGH - Session hijacking and unauthorized API access
- **Likelihood:** MEDIUM - Multiple attack vectors exist

**Mitigations:**
```yaml
- control: Short-lived access tokens
  effectiveness: HIGH
  implementation:
    - Access token TTL: 15 minutes
    - Refresh token TTL: 7 days
    - Require re-authentication for sensitive operations

- control: Secure token storage
  effectiveness: HIGH
  implementation:
    - Store tokens in httpOnly, secure, SameSite cookies
    - Never store in localStorage or sessionStorage
    - Use memory-only storage for SPAs where possible

- control: Token binding to client
  effectiveness: MEDIUM
  implementation:
    - Include client fingerprint in token
    - Bind token to IP address (with caution for mobile)
    - Validate User-Agent consistency

- control: Token rotation on use
  effectiveness: HIGH
  implementation:
    - Issue new refresh token on each use
    - Invalidate old refresh token immediately
    - Detect refresh token reuse (possible theft)
```

**Threat 1.3: API Key Compromise**

- **Description:** API keys leaked in code repositories, logs, or client-side code
- **Attack Vector:** GitHub scanning, log analysis, decompilation
- **Impact:** MEDIUM - Unauthorized API access within key scope
- **Likelihood:** HIGH - Very common occurrence

**Mitigations:**
```yaml
- control: Avoid API keys for user authentication
  effectiveness: HIGH
  implementation:
    - Use API keys only for service-to-service auth
    - Never embed keys in mobile apps or client-side code
    - Use OAuth 2.0 for user authentication

- control: Secret scanning in CI/CD
  effectiveness: HIGH
  implementation:
    - Use git-secrets, TruffleHog, or GitHub secret scanning
    - Block commits containing API keys
    - Scan all historical commits

- control: API key rotation and scoping
  effectiveness: MEDIUM
  implementation:
    - Rotate keys every 90 days
    - Scope keys to specific operations
    - Track key usage and expire unused keys

- control: Key management service
  effectiveness: HIGH
  implementation:
    - Store keys in HashiCorp Vault, AWS Secrets Manager, etc.
    - Retrieve keys at runtime only
    - Audit all key access
```

### Tampering (Data Integrity)

**Threat 2.1: Request Tampering**

- **Description:** Attacker modifies API request parameters to bypass authorization or manipulate data
- **Attack Vector:** Intercepted request modified in transit or by client
- **Impact:** HIGH - Unauthorized data modification or privilege escalation
- **Likelihood:** MEDIUM - Requires MITM or client-side manipulation

**Mitigations:**
```yaml
- control: HTTPS enforcement (TLS 1.3)
  effectiveness: HIGH
  implementation:
    - Enforce TLS 1.3 minimum
    - Use HSTS with includeSubDomains and preload
    - Implement certificate pinning for mobile apps

- control: Request signing
  effectiveness: HIGH
  implementation:
    - Sign sensitive requests with HMAC
    - Include timestamp to prevent replay
    - Validate signature on server before processing

- control: Input validation
  effectiveness: HIGH
  implementation:
    - Validate all parameters against schema
    - Reject unexpected fields
    - Sanitize all input before processing

- control: Server-side authorization checks
  effectiveness: CRITICAL
  implementation:
    - Never trust client-provided user IDs
    - Verify user has permission for requested operation
    - Re-validate authorization for each request
```

**Threat 2.2: Response Tampering**

- **Description:** Attacker intercepts and modifies API response data
- **Attack Vector:** Man-in-the-middle attack, compromised proxy
- **Impact:** MEDIUM - Client receives incorrect data, potential security decisions based on false data
- **Likelihood:** LOW - Requires MITM position

**Mitigations:**
```yaml
- control: TLS encryption
  effectiveness: HIGH
  implementation:
    - All API traffic over HTTPS only
    - Proper certificate validation
    - No mixed content allowed

- control: Response integrity checks
  effectiveness: MEDIUM
  implementation:
    - Include response signature for critical data
    - Use Content-Security-Policy headers
    - Subresource Integrity (SRI) for any CDN resources

- control: Certificate pinning
  effectiveness: HIGH
  implementation:
    - Pin server certificates in mobile apps
    - Use public key pinning for critical connections
    - Implement pin backup keys
```

**Threat 2.3: SQL Injection**

- **Description:** Attacker injects SQL commands through API parameters
- **Attack Vector:** Unsanitized input used in SQL queries
- **Impact:** CRITICAL - Database compromise, data exfiltration, data modification
- **Likelihood:** MEDIUM - Common vulnerability if not properly mitigated

**Mitigations:**
```yaml
- control: Parameterized queries
  effectiveness: CRITICAL
  implementation:
    - Use prepared statements exclusively
    - Never concatenate user input into SQL
    - Use ORM with parameterized queries

- control: Input validation
  effectiveness: HIGH
  implementation:
    - Validate data types (integer, UUID, etc.)
    - Whitelist allowed characters
    - Reject SQL keywords in unexpected fields

- control: Least privilege database access
  effectiveness: MEDIUM
  implementation:
    - API service uses read-only account where possible
    - Separate accounts for read vs. write operations
    - Restrict database permissions to required tables only

- control: Web Application Firewall (WAF)
  effectiveness: MEDIUM
  implementation:
    - Deploy WAF with SQL injection rules
    - Block common SQL injection patterns
    - Log and alert on detected attempts
```

### Repudiation (Accountability)

**Threat 3.1: Action Denial**

- **Description:** User denies performing an action (data deletion, unauthorized access)
- **Attack Vector:** Lack of audit trail, insufficient logging
- **Impact:** MEDIUM - Cannot prove user performed action, compliance violations
- **Likelihood:** HIGH - Users frequently claim they didn't perform actions

**Mitigations:**
```yaml
- control: Comprehensive audit logging
  effectiveness: HIGH
  implementation:
    - Log all state-changing operations
    - Include user ID, timestamp, IP, action, resource
    - Log authentication events (success and failure)
    - Never log sensitive data (passwords, tokens, PII)

- control: Immutable audit trail
  effectiveness: HIGH
  implementation:
    - Write logs to append-only storage
    - Use centralized logging (ELK, Splunk, CloudWatch)
    - Prevent log modification or deletion
    - Cryptographically sign log entries

- control: Request tracking
  effectiveness: MEDIUM
  implementation:
    - Assign unique request ID to each API call
    - Include request ID in all logs and responses
    - Enable correlation across distributed services

- control: User consent tracking
  effectiveness: MEDIUM
  implementation:
    - Record explicit user consent for sensitive actions
    - Require confirmation for destructive operations
    - Log consent timestamps and IP addresses
```

**Threat 3.2: Log Tampering**

- **Description:** Attacker modifies or deletes audit logs to hide malicious activity
- **Attack Vector:** Compromised server access, insufficient log protection
- **Impact:** HIGH - Loss of audit trail, inability to investigate incidents
- **Likelihood:** LOW - Requires elevated access

**Mitigations:**
```yaml
- control: Centralized logging
  effectiveness: HIGH
  implementation:
    - Forward logs to external SIEM immediately
    - No local log storage on API servers
    - Use TLS for log transmission

- control: Log integrity verification
  effectiveness: HIGH
  implementation:
    - Cryptographic hashing of log entries
    - Chain logs together (blockchain-style)
    - Periodic integrity checks

- control: Access controls on logs
  effectiveness: MEDIUM
  implementation:
    - Separate logging service account
    - Read-only access for auditors
    - Alert on log access by unauthorized users

- control: Log retention and backup
  effectiveness: MEDIUM
  implementation:
    - Retain logs for minimum 1 year
    - Immutable S3 storage or WORM media
    - Automated backup verification
```

### Information Disclosure (Confidentiality)

**Threat 4.1: Sensitive Data Exposure in Responses**

- **Description:** API returns excessive data including sensitive fields
- **Attack Vector:** Over-fetching, verbose error messages, debug mode
- **Impact:** HIGH - Exposure of PII, credentials, internal system details
- **Likelihood:** HIGH - Very common in APIs

**Mitigations:**
```yaml
- control: Response filtering
  effectiveness: HIGH
  implementation:
    - Return only requested fields
    - Use DTO pattern to control response shape
    - Never include password hashes in responses
    - Redact sensitive fields (SSN, credit cards)

- control: Field-level authorization
  effectiveness: HIGH
  implementation:
    - Check permissions for each returned field
    - Hide fields user doesn't have access to
    - Use different response schemas for different roles

- control: Generic error messages
  effectiveness: MEDIUM
  implementation:
    - Return generic errors to client
    - Log detailed errors server-side only
    - Never expose stack traces
    - Use error codes instead of detailed messages

- control: Disable debug mode in production
  effectiveness: CRITICAL
  implementation:
    - No verbose error messages
    - No debug endpoints exposed
    - Remove all console.log and debug code
```

**Threat 4.2: Insecure Direct Object Reference (IDOR)**

- **Description:** Attacker accesses resources by guessing or enumerating IDs
- **Attack Vector:** Sequential IDs, predictable UUIDs, missing authorization checks
- **Impact:** HIGH - Unauthorized data access, privacy violation
- **Likelihood:** HIGH - Extremely common vulnerability

**Mitigations:**
```yaml
- control: Authorization checks on every request
  effectiveness: CRITICAL
  implementation:
    - Verify user owns or has access to requested resource
    - Check permissions before database query
    - Never rely on client-provided IDs alone

- control: Non-sequential identifiers
  effectiveness: MEDIUM
  implementation:
    - Use UUIDv4 for resource IDs
    - Avoid auto-incrementing integer IDs
    - Don't expose internal database IDs

- control: Indirect object references
  effectiveness: HIGH
  implementation:
    - Map external IDs to internal IDs
    - Use per-user ID namespaces
    - Implement access tokens for resources

- control: Rate limiting enumeration attempts
  effectiveness: MEDIUM
  implementation:
    - Limit requests per user/IP
    - Detect ID enumeration patterns
    - CAPTCHA after repeated 404s
```

**Threat 4.3: Mass Assignment**

- **Description:** Attacker modifies object properties they shouldn't have access to
- **Attack Vector:** Sending unexpected fields in PUT/PATCH requests
- **Impact:** HIGH - Privilege escalation, unauthorized data modification
- **Likelihood:** MEDIUM - Requires knowledge of internal field names

**Mitigations:**
```yaml
- control: Explicit field whitelisting
  effectiveness: CRITICAL
  implementation:
    - Define allowed fields for each endpoint
    - Reject requests with unexpected fields
    - Use separate DTOs for input validation

- control: Read-only fields enforcement
  effectiveness: HIGH
  implementation:
    - Mark fields as read-only (id, created_at, role)
    - Prevent modification even if included in request
    - Validate field-level permissions

- control: Separate admin endpoints
  effectiveness: HIGH
  implementation:
    - Use different endpoints for admin operations
    - Don't mix user and admin fields in same request
    - Require elevated permissions for admin endpoints
```

### Denial of Service

**Threat 5.1: API Abuse / Resource Exhaustion**

- **Description:** Attacker overwhelms API with excessive requests
- **Attack Vector:** Automated bots, distributed attacks, large payload requests
- **Impact:** HIGH - Service unavailability, increased costs
- **Likelihood:** HIGH - Very common attack

**Mitigations:**
```yaml
- control: Rate limiting
  effectiveness: HIGH
  implementation:
    - 100 requests per minute per user
    - 1000 requests per hour per IP
    - Lower limits for expensive endpoints
    - Return 429 Too Many Requests with Retry-After

- control: Request size limits
  effectiveness: HIGH
  implementation:
    - Max request body: 1MB
    - Max URL length: 2048 characters
    - Reject oversized requests immediately
    - Limit file upload sizes

- control: Response pagination
  effectiveness: MEDIUM
  implementation:
    - Maximum 100 items per page
    - Default page size: 20 items
    - Cursor-based pagination for large datasets
    - Timeout long-running queries (30 seconds)

- control: DDoS protection
  effectiveness: MEDIUM
  implementation:
    - Use CloudFlare, AWS Shield, or Akamai
    - Implement connection limits
    - Geographic blocking for suspicious regions
    - Challenge-based verification (CAPTCHA)
```

**Threat 5.2: Regex DoS (ReDoS)**

- **Description:** Attacker sends input that causes catastrophic backtracking in regex validation
- **Attack Vector:** Crafted input exploiting inefficient regex patterns
- **Impact:** MEDIUM - CPU exhaustion, slow response times
- **Likelihood:** LOW - Requires specific vulnerable regex patterns

**Mitigations:**
```yaml
- control: Regex timeout enforcement
  effectiveness: HIGH
  implementation:
    - Set regex execution timeout (100ms)
    - Terminate on timeout and reject input
    - Use regex analysis tools (safe-regex)

- control: Avoid complex regex
  effectiveness: HIGH
  implementation:
    - Use simple patterns where possible
    - Avoid nested quantifiers (.*.*. etc.)
    - Test regex with ReDoS checkers

- control: Input length limits
  effectiveness: MEDIUM
  implementation:
    - Limit input before regex validation
    - Reject excessively long strings early
    - Validate length before pattern matching
```

**Threat 5.3: Database Query Amplification**

- **Description:** Single API request triggers expensive database operations
- **Attack Vector:** Unbounded queries, missing pagination, N+1 query problems
- **Impact:** HIGH - Database overload, slow response times
- **Likelihood:** MEDIUM - Common in poorly optimized APIs

**Mitigations:**
```yaml
- control: Query result limits
  effectiveness: HIGH
  implementation:
    - Hard limit on query results (1000 rows max)
    - Force pagination on list endpoints
    - Use LIMIT clause in all queries

- control: Query timeout
  effectiveness: HIGH
  implementation:
    - Database query timeout: 10 seconds
    - Cancel long-running queries
    - Alert on frequent timeouts

- control: Database connection pooling
  effectiveness: MEDIUM
  implementation:
    - Limit max concurrent connections
    - Implement connection queue
    - Reject requests when pool exhausted

- control: Caching
  effectiveness: MEDIUM
  implementation:
    - Cache frequently accessed data (Redis)
    - Set appropriate TTLs
    - Cache invalidation on updates
```

### Elevation of Privilege

**Threat 6.1: Broken Authorization**

- **Description:** User accesses resources or performs actions beyond their permissions
- **Attack Vector:** Missing authorization checks, flawed permission logic
- **Impact:** CRITICAL - Unauthorized data access, system compromise
- **Likelihood:** HIGH - Very common vulnerability

**Mitigations:**
```yaml
- control: Centralized authorization
  effectiveness: CRITICAL
  implementation:
    - Single authorization service/library
    - Consistent permission checks across all endpoints
    - Deny by default policy

- control: Role-based access control (RBAC)
  effectiveness: HIGH
  implementation:
    - Define clear roles (admin, user, viewer)
    - Assign minimal required permissions
    - Check role before operation

- control: Attribute-based access control (ABAC)
  effectiveness: HIGH
  implementation:
    - Evaluate context (time, location, resource owner)
    - Dynamic permission decisions
    - Fine-grained access control

- control: Authorization testing
  effectiveness: HIGH
  implementation:
    - Automated tests for each permission scenario
    - Test negative cases (should be denied)
    - Regular security audits
```

**Threat 6.2: JWT Algorithm Confusion**

- **Description:** Attacker exploits JWT library to accept unsigned tokens
- **Attack Vector:** Changing JWT algorithm from RS256 to "none"
- **Impact:** CRITICAL - Complete authentication bypass
- **Likelihood:** LOW - Modern libraries protect against this

**Mitigations:**
```yaml
- control: Explicit algorithm specification
  effectiveness: CRITICAL
  implementation:
    - Specify allowed algorithms (RS256, ES256)
    - Reject "none" algorithm explicitly
    - Never allow symmetric algorithm if using asymmetric keys

- control: JWT library security
  effectiveness: HIGH
  implementation:
    - Use well-maintained libraries
    - Keep libraries up to date
    - Review security advisories

- control: Token validation
  effectiveness: CRITICAL
  implementation:
    - Validate signature on every request
    - Verify issuer (iss claim)
    - Verify audience (aud claim)
    - Check expiration (exp claim)
    - Validate not-before (nbf claim)
```

**Threat 6.3: Path Traversal in API Routes**

- **Description:** Attacker manipulates path parameters to access unauthorized endpoints
- **Attack Vector:** Using ../ in path parameters, URL encoding tricks
- **Impact:** MEDIUM - Access to admin endpoints, information disclosure
- **Likelihood:** LOW - Most frameworks protect against this

**Mitigations:**
```yaml
- control: Path parameter validation
  effectiveness: HIGH
  implementation:
    - Validate path parameters against whitelist
    - Reject ../ and encoded equivalents
    - Use strict routing patterns

- control: Framework security features
  effectiveness: HIGH
  implementation:
    - Use framework's built-in protections
    - Enable strict routing mode
    - Validate all path segments

- control: Separate route handlers
  effectiveness: MEDIUM
  implementation:
    - Don't use dynamic paths for critical endpoints
    - Use different route prefixes for admin vs. user
    - Implement route-level authentication
```

## Attack Surface Analysis

**High-Risk Endpoints:**

1. **POST /api/v1/auth/login**
   - Risks: Credential stuffing, brute force, timing attacks
   - Priority Mitigations: Rate limiting, MFA, account lockout

2. **PUT /api/v1/users/{id}**
   - Risks: IDOR, mass assignment, privilege escalation
   - Priority Mitigations: Authorization checks, field whitelisting

3. **DELETE /api/v1/users/{id}**
   - Risks: Unauthorized deletion, lack of audit trail
   - Priority Mitigations: Strong authorization, audit logging, soft delete

4. **POST /api/v1/resources/{id}/share**
   - Risks: Unauthorized sharing, information disclosure
   - Priority Mitigations: Permission verification, notification, audit logging

**Trust Boundaries:**

```
Untrusted:
- All client input (headers, parameters, body)
- External API integrations
- User-uploaded files

Semi-Trusted:
- Internal microservices (verify with mTLS)
- Database responses (could be tampered if DB compromised)

Trusted:
- Configuration management system
- Secrets manager
- Internal audit logs
```

## Threat Risk Matrix

| Threat | Impact | Likelihood | Risk Level | Priority |
|--------|--------|------------|------------|----------|
| SQL Injection | CRITICAL | MEDIUM | CRITICAL | P0 |
| Broken Authorization | CRITICAL | HIGH | CRITICAL | P0 |
| JWT Algorithm Confusion | CRITICAL | LOW | HIGH | P1 |
| IDOR | HIGH | HIGH | HIGH | P1 |
| Credential Theft | HIGH | MEDIUM | HIGH | P1 |
| Token Theft | HIGH | MEDIUM | HIGH | P1 |
| Mass Assignment | HIGH | MEDIUM | HIGH | P1 |
| Sensitive Data Exposure | HIGH | HIGH | HIGH | P1 |
| API Abuse / DoS | HIGH | HIGH | HIGH | P1 |
| Request Tampering | HIGH | MEDIUM | MEDIUM | P2 |
| API Key Compromise | MEDIUM | HIGH | MEDIUM | P2 |
| Log Tampering | HIGH | LOW | MEDIUM | P2 |
| Database Query Amplification | HIGH | MEDIUM | MEDIUM | P2 |
| Response Tampering | MEDIUM | LOW | LOW | P3 |
| ReDoS | MEDIUM | LOW | LOW | P3 |
| Path Traversal | MEDIUM | LOW | LOW | P3 |

## Security Controls Summary

**Critical Controls (Must Implement):**
- Parameterized queries for all database operations
- Authorization checks on every endpoint
- JWT signature validation with explicit algorithm
- HTTPS enforcement with HSTS
- Comprehensive audit logging
- Input validation and sanitization

**High-Priority Controls:**
- Multi-factor authentication
- Rate limiting (per user and per IP)
- Non-sequential resource identifiers
- Field-level authorization
- Token rotation and short TTLs
- WAF deployment

**Recommended Controls:**
- API key rotation and scoping
- Response pagination and limits
- Request signing for sensitive operations
- Certificate pinning for mobile apps
- Caching for performance and DoS protection
- Automated security testing in CI/CD

## Testing Recommendations

```yaml
Security Testing:
  - tool: OWASP ZAP
    focus: Automated vulnerability scanning
    frequency: Every build

  - tool: Burp Suite
    focus: Manual penetration testing
    frequency: Quarterly

  - tool: SQLMap
    focus: SQL injection detection
    frequency: Every release

  - tool: Postman / Newman
    focus: Authorization testing
    frequency: Every build

  - tool: JMeter / Locust
    focus: Load testing and DoS resilience
    frequency: Monthly

  - tool: SonarQube
    focus: Static code analysis
    frequency: Every commit

  - tool: Dependency-Check
    focus: Vulnerable dependencies
    frequency: Weekly
```

## Compliance Considerations

**OWASP API Security Top 10 Coverage:**
- API1:2023 Broken Object Level Authorization → Mitigated by authorization checks
- API2:2023 Broken Authentication → Mitigated by MFA, token management
- API3:2023 Broken Object Property Level Authorization → Mitigated by field whitelisting
- API4:2023 Unrestricted Resource Consumption → Mitigated by rate limiting
- API5:2023 Broken Function Level Authorization → Mitigated by RBAC
- API6:2023 Unrestricted Access to Sensitive Business Flows → Mitigated by rate limiting
- API7:2023 Server Side Request Forgery → Mitigated by input validation
- API8:2023 Security Misconfiguration → Mitigated by secure defaults
- API9:2023 Improper Inventory Management → Mitigated by API documentation
- API10:2023 Unsafe Consumption of APIs → Mitigated by validation of external data

## References

- [OWASP API Security Top 10](https://owasp.org/www-project-api-security/)
- [OWASP REST Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/REST_Security_Cheat_Sheet.html)
- [Microsoft Threat Modeling Tool](https://www.microsoft.com/en-us/securityengineering/sdl/threatmodeling)
- [STRIDE Methodology](https://learn.microsoft.com/en-us/azure/security/develop/threat-modeling-tool-threats)

```

### examples/threat-models/microservices-threat-model.md

```markdown
# Microservices Threat Model

## Overview

Comprehensive threat model for microservices architecture using STRIDE methodology. Analyze threats across service-to-service communication, container security, secrets management, network segmentation, and supply chain risks.

## System Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                         MICROSERVICES ARCHITECTURE                       │
│                                                                           │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐            │
│  │  API Gateway │────▶│   Service A  │────▶│   Service B  │            │
│  │              │     │  (Auth)      │     │  (Business)  │            │
│  │ - Auth       │     │              │     │              │            │
│  │ - Routing    │     │ - User Mgmt  │     │ - Orders     │            │
│  │ - Rate Limit │     │ - Sessions   │     │ - Inventory  │            │
│  └──────┬───────┘     └──────┬───────┘     └──────┬───────┘            │
│         │                    │                    │                     │
│         │                    └────────────────────┼─────────────┐       │
│         │                                         │             │       │
│         ▼                                         ▼             ▼       │
│  ┌──────────────┐                         ┌──────────────┬──────────┐  │
│  │ Service Mesh │                         │   Service C  │Database  │  │
│  │ (Istio)      │                         │  (Payment)   │(Postgres)│  │
│  │              │                         │              │          │  │
│  │ - mTLS       │                         │ - Stripe API │          │  │
│  │ - AuthZ      │                         │ - PCI DSS    │          │  │
│  │ - Observ.    │                         └──────────────┴──────────┘  │
│  └──────────────┘                                                       │
│                                                                           │
│  Supporting Services:                                                    │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐          │
│  │ Message    │ │ Config     │ │ Secrets    │ │ Monitoring │          │
│  │ Queue      │ │ Server     │ │ Manager    │ │ (Prom/Gr)  │          │
│  │ (RabbitMQ) │ │ (Consul)   │ │ (Vault)    │ │            │          │
│  └────────────┘ └────────────┘ └────────────┘ └────────────┘          │
└─────────────────────────────────────────────────────────────────────────┘
```

## STRIDE Analysis

### Spoofing (Identity)

**Threat 1.1: Service Impersonation**

- **Description:** Malicious service impersonates legitimate service to intercept requests
- **Attack Vector:** Compromised container, DNS poisoning, service registry manipulation
- **Impact:** CRITICAL - Data exfiltration, unauthorized access, man-in-the-middle
- **Likelihood:** MEDIUM - Requires container or infrastructure compromise

**Mitigations:**

```yaml
- control: Mutual TLS (mTLS) authentication
  effectiveness: CRITICAL
  implementation:
    service_mesh: Istio
    configuration:
      mode: STRICT
      mtls:
        enabled: true
      certificate_rotation: 24h
      trusted_ca: cluster-ca
    code_example: |
      apiVersion: security.istio.io/v1beta1
      kind: PeerAuthentication
      metadata:
        name: default
        namespace: production
      spec:
        mtls:
          mode: STRICT

- control: Service identity certificates
  effectiveness: HIGH
  implementation:
    - Use SPIFFE/SPIRE for service identity
    - Short-lived certificates (24 hour TTL)
    - Automatic certificate rotation
    - Bind certificates to pod identity
    code_example: |
      # SPIRE registration entry
      spire-server entry create \
        -spiffeID spiffe://example.com/service-a \
        -parentID spiffe://example.com/k8s-node \
        -selector k8s:ns:production \
        -selector k8s:sa:service-a

- control: Service registry authentication
  effectiveness: HIGH
  implementation:
    - Authenticate to service registry (Consul, Eureka)
    - ACL tokens for service registration
    - Verify service identity before registration
    - Monitor for unauthorized registrations

- control: Network policies
  effectiveness: MEDIUM
  implementation:
    - Kubernetes NetworkPolicy for pod-to-pod
    - Deny all ingress by default
    - Explicit allow rules for service dependencies
    code_example: |
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: service-b-policy
      spec:
        podSelector:
          matchLabels:
            app: service-b
        policyTypes:
          - Ingress
        ingress:
          - from:
              - podSelector:
                  matchLabels:
                    app: service-a
            ports:
              - protocol: TCP
                port: 8080
```

**Threat 1.2: Container Image Tampering**

- **Description:** Attacker replaces legitimate container image with malicious version
- **Attack Vector:** Compromised registry, supply chain attack, insider threat
- **Impact:** CRITICAL - Malicious code execution, data theft, backdoor installation
- **Likelihood:** MEDIUM - Requires registry access or CI/CD compromise

**Mitigations:**

```yaml
- control: Image signing and verification
  effectiveness: CRITICAL
  implementation:
    tool: Sigstore/Cosign
    process:
      - Sign all images during CI/CD
      - Verify signatures before deployment
      - Reject unsigned images
    code_example: |
      # Sign image
      cosign sign --key cosign.key \
        gcr.io/project/service-a:v1.2.3

      # Verify on deployment
      cosign verify --key cosign.pub \
        gcr.io/project/service-a:v1.2.3

- control: Image vulnerability scanning
  effectiveness: HIGH
  implementation:
    tool: Trivy, Clair, Anchore
    process:
      - Scan images in CI/CD pipeline
      - Block deployment of critical vulnerabilities
      - Daily rescans of running images
    code_example: |
      trivy image --severity HIGH,CRITICAL \
        --exit-code 1 \
        gcr.io/project/service-a:v1.2.3

- control: Admission control
  effectiveness: CRITICAL
  implementation:
    tool: OPA Gatekeeper, Kyverno
    policies:
      - Require signed images
      - Require images from approved registries
      - Reject images with critical vulnerabilities
    code_example: |
      apiVersion: constraints.gatekeeper.sh/v1beta1
      kind: K8sAllowedRepos
      metadata:
        name: prod-repo-restriction
      spec:
        match:
          kinds:
            - apiGroups: [""]
              kinds: ["Pod"]
          namespaces: ["production"]
        parameters:
          repos:
            - "gcr.io/approved-project/"

- control: Private container registry
  effectiveness: HIGH
  implementation:
    - Use private registry (GCR, ECR, ACR)
    - Require authentication to pull images
    - Enable vulnerability scanning
    - Audit image access logs
```

**Threat 1.3: API Token Theft**

- **Description:** Service-to-service API tokens stolen from environment variables or logs
- **Attack Vector:** Container escape, log aggregation access, compromised secrets
- **Impact:** HIGH - Unauthorized service access, lateral movement
- **Likelihood:** HIGH - Common misconfiguration

**Mitigations:**

```yaml
- control: Short-lived tokens
  effectiveness: HIGH
  implementation:
    - Token TTL: 15 minutes maximum
    - Automatic token rotation
    - Service mesh handles token refresh
    code_example: |
      # Istio JWT configuration
      apiVersion: security.istio.io/v1beta1
      kind: RequestAuthentication
      metadata:
        name: jwt-auth
      spec:
        jwtRules:
          - issuer: "https://auth.example.com"
            jwksUri: "https://auth.example.com/.well-known/jwks.json"
            audiences:
              - "service-b"
            forwardOriginalToken: false

- control: Secrets management system
  effectiveness: CRITICAL
  implementation:
    tool: HashiCorp Vault, AWS Secrets Manager
    process:
      - Store tokens in secrets manager
      - Inject at runtime only
      - Never log tokens
      - Rotate on suspected compromise
    code_example: |
      # Vault Kubernetes auth
      vault write auth/kubernetes/role/service-a \
        bound_service_account_names=service-a \
        bound_service_account_namespaces=production \
        policies=service-a-policy \
        ttl=1h

- control: Workload identity
  effectiveness: HIGH
  implementation:
    - Use cloud provider workload identity
    - Bind pod to cloud IAM role
    - No static credentials needed
    code_example: |
      # GKE Workload Identity
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        annotations:
          iam.gke.io/gcp-service-account: [email protected]

- control: Token scope limitation
  effectiveness: MEDIUM
  implementation:
    - Limit token to specific service endpoints
    - Implement audience validation
    - Shortest privilege duration
```

### Tampering (Data Integrity)

**Threat 2.1: Message Queue Tampering**

- **Description:** Attacker modifies messages in transit between services via message queue
- **Attack Vector:** Compromised queue credentials, queue admin access
- **Impact:** HIGH - Data corruption, fraudulent transactions, system instability
- **Likelihood:** MEDIUM - Requires queue access

**Mitigations:**

```yaml
- control: Message signing
  effectiveness: HIGH
  implementation:
    - Sign messages with HMAC or digital signature
    - Include timestamp to prevent replay
    - Verify signature before processing
    code_example: |
      import hmac
      import hashlib
      import json
      from datetime import datetime

      def sign_message(message, secret_key):
          payload = {
              "data": message,
              "timestamp": datetime.utcnow().isoformat()
          }
          payload_json = json.dumps(payload, sort_keys=True)
          signature = hmac.new(
              secret_key.encode(),
              payload_json.encode(),
              hashlib.sha256
          ).hexdigest()
          return {
              "payload": payload,
              "signature": signature
          }

      def verify_message(signed_message, secret_key):
          payload_json = json.dumps(
              signed_message["payload"],
              sort_keys=True
          )
          expected_sig = hmac.new(
              secret_key.encode(),
              payload_json.encode(),
              hashlib.sha256
          ).hexdigest()

          if not hmac.compare_digest(
              expected_sig,
              signed_message["signature"]
          ):
              raise ValueError("Invalid signature")

          return signed_message["payload"]["data"]

- control: TLS for message queue connections
  effectiveness: HIGH
  implementation:
    - Enable TLS for RabbitMQ, Kafka, SQS
    - Mutual TLS for producer/consumer authentication
    - Certificate validation
    code_example: |
      # RabbitMQ TLS config
      ssl_options = {
          "certfile": "/certs/client-cert.pem",
          "keyfile": "/certs/client-key.pem",
          "ca_certs": "/certs/ca-cert.pem",
          "cert_reqs": ssl.CERT_REQUIRED
      }
      connection = pika.BlockingConnection(
          pika.ConnectionParameters(
              host='rabbitmq.example.com',
              port=5671,
              credentials=credentials,
              ssl_options=ssl_options
          )
      )

- control: Queue access controls
  effectiveness: HIGH
  implementation:
    - Separate credentials per service
    - Read-only vs. write-only permissions
    - Queue-level ACLs
    - Audit queue access

- control: Message validation
  effectiveness: MEDIUM
  implementation:
    - Schema validation (JSON Schema, Protobuf)
    - Reject malformed messages
    - Size limits on messages
    - Content type verification
```

**Threat 2.2: Database Injection via Service Chain**

- **Description:** Malicious input propagates through service chain to database query
- **Attack Vector:** Unsanitized input passed between services, SQL/NoSQL injection
- **Impact:** CRITICAL - Data breach, data corruption, privilege escalation
- **Likelihood:** MEDIUM - Requires multiple validation failures

**Mitigations:**

```yaml
- control: Input validation at every service boundary
  effectiveness: CRITICAL
  implementation:
    - Validate at API gateway
    - Re-validate at each service
    - Never trust upstream service input
    code_example: |
      from pydantic import BaseModel, validator, constr

      class OrderRequest(BaseModel):
          order_id: constr(regex=r'^[A-Z0-9]{8}$')
          quantity: int

          @validator('quantity')
          def validate_quantity(cls, v):
              if v < 1 or v > 1000:
                  raise ValueError('Quantity must be 1-1000')
              return v

      def process_order(request_data):
          # Validate even from trusted service
          validated = OrderRequest(**request_data)
          # Use parameterized query
          query = """
              INSERT INTO orders (id, quantity)
              VALUES (?, ?)
          """
          db.execute(query, [validated.order_id, validated.quantity])

- control: Parameterized queries
  effectiveness: CRITICAL
  implementation:
    - Use prepared statements always
    - ORM with parameter binding
    - Never concatenate user input

- control: Least privilege database access
  effectiveness: HIGH
  implementation:
    - Each service has separate DB user
    - Grant minimum required permissions
    - Read-only accounts where possible
    - No shared credentials

- control: Database firewall
  effectiveness: MEDIUM
  implementation:
    - Monitor and block suspicious queries
    - Pattern-based SQL injection detection
    - Alert on anomalous database access
```

**Threat 2.3: Configuration Tampering**

- **Description:** Attacker modifies service configuration to alter behavior
- **Attack Vector:** Compromised config server, insecure config files, environment variable injection
- **Impact:** HIGH - Service malfunction, security bypass, data exfiltration
- **Likelihood:** MEDIUM - Requires config system access

**Mitigations:**

```yaml
- control: Configuration encryption
  effectiveness: HIGH
  implementation:
    - Encrypt sensitive config at rest
    - Decrypt only at runtime
    - Use sealed secrets for Kubernetes
    code_example: |
      # Sealed Secrets
      kubeseal --format yaml < secret.yaml > sealed-secret.yaml
      kubectl apply -f sealed-secret.yaml

- control: Configuration version control
  effectiveness: HIGH
  implementation:
    - Store config in Git
    - Require code review for changes
    - Audit trail of config modifications
    - Rollback capability

- control: Immutable infrastructure
  effectiveness: HIGH
  implementation:
    - Bake config into container image
    - No runtime config changes
    - Redeploy to change config
    - ConfigMaps as read-only volumes

- control: Configuration validation
  effectiveness: MEDIUM
  implementation:
    - Schema validation on startup
    - Reject invalid configuration
    - Fail-safe defaults
    code_example: |
      import schema

      config_schema = schema.Schema({
          'database_url': schema.And(str, len),
          'api_timeout': schema.And(int, lambda n: 1 <= n <= 300),
          'log_level': schema.Or('DEBUG', 'INFO', 'WARNING', 'ERROR')
      })

      config = load_config()
      validated_config = config_schema.validate(config)
```

### Repudiation (Accountability)

**Threat 3.1: Distributed Tracing Blind Spots**

- **Description:** Lack of correlation between service calls prevents audit trail
- **Attack Vector:** Missing trace IDs, incomplete logging, log deletion
- **Impact:** MEDIUM - Cannot investigate incidents, compliance violations
- **Likelihood:** HIGH - Common in complex microservices

**Mitigations:**

```yaml
- control: Distributed tracing
  effectiveness: HIGH
  implementation:
    tool: OpenTelemetry, Jaeger, Zipkin
    process:
      - Propagate trace ID across all services
      - Include trace ID in all logs
      - Sample critical paths at 100%
    code_example: |
      from opentelemetry import trace
      from opentelemetry.instrumentation.flask import FlaskInstrumentor

      tracer = trace.get_tracer(__name__)

      @app.route('/api/order')
      def create_order():
          with tracer.start_as_current_span("create_order") as span:
              span.set_attribute("order.id", order_id)
              span.set_attribute("user.id", user_id)

              # Call downstream service
              response = requests.post(
                  "http://service-b/process",
                  json=order_data,
                  headers={
                      "traceparent": span.get_span_context()
                  }
              )

- control: Structured logging
  effectiveness: HIGH
  implementation:
    - JSON formatted logs
    - Include trace ID, span ID, service name
    - Centralized log aggregation
    code_example: |
      import structlog

      log = structlog.get_logger()

      log.info(
          "order.created",
          order_id=order_id,
          user_id=user_id,
          amount=total,
          trace_id=trace_id,
          span_id=span_id
      )

- control: Service mesh observability
  effectiveness: HIGH
  implementation:
    - Automatic request tracking
    - Service dependency graph
    - Request flow visualization
    - Anomaly detection

- control: Immutable audit logs
  effectiveness: CRITICAL
  implementation:
    - Stream logs to external SIEM
    - Append-only log storage
    - Log integrity verification
    - Long-term retention (7 years for compliance)
```

**Threat 3.2: Service-to-Service Call Denial**

- **Description:** Service denies making unauthorized API call to another service
- **Attack Vector:** Compromised service credentials, lack of audit trail
- **Impact:** MEDIUM - Cannot prove malicious activity, insider threat investigation difficulty
- **Likelihood:** LOW - Requires sophisticated attack

**Mitigations:**

```yaml
- control: Service call authentication logs
  effectiveness: HIGH
  implementation:
    - Log all outbound service calls
    - Include service identity, timestamp, endpoint
    - Record request/response correlation
    code_example: |
      def call_downstream_service(endpoint, data):
          request_id = str(uuid.uuid4())

          log.info(
              "service.call.start",
              request_id=request_id,
              source_service="service-a",
              target_service="service-b",
              endpoint=endpoint,
              method="POST"
          )

          try:
              response = requests.post(
                  endpoint,
                  json=data,
                  headers={
                      "X-Request-ID": request_id,
                      "X-Source-Service": "service-a"
                  }
              )

              log.info(
                  "service.call.complete",
                  request_id=request_id,
                  status_code=response.status_code,
                  response_time_ms=response.elapsed.total_seconds() * 1000
              )

              return response
          except Exception as e:
              log.error(
                  "service.call.failed",
                  request_id=request_id,
                  error=str(e)
              )
              raise

- control: mTLS audit trail
  effectiveness: HIGH
  implementation:
    - Service mesh logs all mTLS handshakes
    - Record client certificate details
    - Cannot be disabled by service

- control: API gateway logging
  effectiveness: MEDIUM
  implementation:
    - Log all requests at gateway
    - Include authenticated service identity
    - Centralized visibility
```

### Information Disclosure (Confidentiality)

**Threat 4.1: Secrets in Container Images**

- **Description:** Secrets hardcoded in container images or accessible in image layers
- **Attack Vector:** Image inspection, leaked images, registry compromise
- **Impact:** CRITICAL - Credential exposure, lateral movement, data breach
- **Likelihood:** HIGH - Very common mistake

**Mitigations:**

```yaml
- control: Secrets management integration
  effectiveness: CRITICAL
  implementation:
    tool: HashiCorp Vault, AWS Secrets Manager
    process:
      - Never hardcode secrets
      - Retrieve secrets at runtime
      - Short-lived dynamic secrets
    code_example: |
      import hvac

      # Initialize Vault client
      client = hvac.Client(url='https://vault.example.com')
      client.auth.kubernetes.login(
          role='service-a',
          jwt=open('/var/run/secrets/kubernetes.io/serviceaccount/token').read()
      )

      # Retrieve secret
      secret = client.secrets.kv.v2.read_secret_version(
          path='database/credentials/service-a'
      )
      db_password = secret['data']['data']['password']

- control: Image scanning for secrets
  effectiveness: HIGH
  implementation:
    tool: ggshield, TruffleHog, GitGuardian
    process:
      - Scan images in CI/CD
      - Block builds with detected secrets
      - Scan base images
    code_example: |
      # GitLab CI secret detection
      secret_detection:
        stage: test
        script:
          - ggshield scan docker service-a:$CI_COMMIT_SHA
        allow_failure: false

- control: Kubernetes secrets with encryption at rest
  effectiveness: HIGH
  implementation:
    - Enable encryption at rest for etcd
    - Use external KMS (AWS KMS, GCP KMS)
    - Rotate encryption keys regularly
    code_example: |
      # EncryptionConfiguration
      apiVersion: apiserver.config.k8s.io/v1
      kind: EncryptionConfiguration
      resources:
        - resources:
            - secrets
          providers:
            - aescbc:
                keys:
                  - name: key1
                    secret: <base64-encoded-key>
            - identity: {}

- control: Secret rotation
  effectiveness: HIGH
  implementation:
    - Automatic rotation every 90 days
    - Zero-downtime rotation
    - Audit secret access
```

**Threat 4.2: Service-to-Service Traffic Eavesdropping**

- **Description:** Attacker intercepts unencrypted traffic between services
- **Attack Vector:** Network snooping, compromised network infrastructure, ARP spoofing
- **Impact:** HIGH - Data exposure, credential theft, PII leakage
- **Likelihood:** MEDIUM - Requires network access

**Mitigations:**

```yaml
- control: Mutual TLS for all service communication
  effectiveness: CRITICAL
  implementation:
    service_mesh: Istio, Linkerd
    configuration:
      - Enforce mTLS globally
      - Automatic certificate management
      - Certificate rotation every 24 hours
    code_example: |
      apiVersion: security.istio.io/v1beta1
      kind: PeerAuthentication
      metadata:
        name: default
        namespace: istio-system
      spec:
        mtls:
          mode: STRICT

- control: Network encryption
  effectiveness: HIGH
  implementation:
    - TLS 1.3 minimum
    - Strong cipher suites only
    - Disable insecure protocols
    code_example: |
      # Nginx TLS config
      ssl_protocols TLSv1.3;
      ssl_ciphers 'TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384';
      ssl_prefer_server_ciphers on;

- control: Network segmentation
  effectiveness: MEDIUM
  implementation:
    - Separate network namespaces
    - VPC/subnet isolation
    - No direct pod-to-pod without policy

- control: Encrypted message queues
  effectiveness: HIGH
  implementation:
    - Enable TLS for Kafka, RabbitMQ
    - Encrypt messages at application level for sensitive data
    - Key rotation
```

**Threat 4.3: Log Data Exposure**

- **Description:** Sensitive data logged in plaintext and accessible to unauthorized users
- **Attack Vector:** Log aggregation access, log file access, log forwarding to external systems
- **Impact:** HIGH - PII exposure, credential leaks, compliance violations (GDPR, PCI DSS)
- **Likelihood:** HIGH - Very common mistake

**Mitigations:**

```yaml
- control: Log sanitization
  effectiveness: CRITICAL
  implementation:
    - Never log passwords, tokens, credit cards, PII
    - Redact sensitive fields automatically
    - Use structured logging with field filtering
    code_example: |
      import structlog

      def sanitize_log_data(logger, method_name, event_dict):
          sensitive_fields = ['password', 'token', 'ssn', 'credit_card']
          for field in sensitive_fields:
              if field in event_dict:
                  event_dict[field] = '***REDACTED***'
          return event_dict

      structlog.configure(
          processors=[
              sanitize_log_data,
              structlog.processors.JSONRenderer()
          ]
      )

- control: Log access controls
  effectiveness: HIGH
  implementation:
    - RBAC on log aggregation system
    - Separate logs by sensitivity level
    - Audit log access
    - Encrypt logs at rest

- control: Log retention policies
  effectiveness: MEDIUM
  implementation:
    - Automatic deletion of old logs
    - Compliance-based retention (7 years financial, 90 days debug)
    - Secure log disposal

- control: Dynamic data masking
  effectiveness: HIGH
  implementation:
    - Mask PII in logs based on viewer role
    - Show full data only to authorized users
    - Audit unmasking events
```

### Denial of Service

**Threat 5.1: Cascading Failures**

- **Description:** Failure in one service cascades through dependent services, causing system-wide outage
- **Attack Vector:** Service overload, resource exhaustion, synchronous blocking calls
- **Impact:** CRITICAL - Complete system unavailability
- **Likelihood:** MEDIUM - Common in tightly coupled systems

**Mitigations:**

```yaml
- control: Circuit breakers
  effectiveness: HIGH
  implementation:
    library: Hystrix, Resilience4j, Polly
    configuration:
      - Open circuit after 50% failure rate
      - 10-second wait before retry
      - Fallback to cached data or degraded mode
    code_example: |
      from pybreaker import CircuitBreaker

      breaker = CircuitBreaker(
          fail_max=5,
          timeout_duration=60,
          expected_exception=ServiceException
      )

      @breaker
      def call_downstream_service():
          response = requests.get('http://service-b/api/data')
          return response.json()

      try:
          data = call_downstream_service()
      except CircuitBreakerError:
          # Circuit open, use fallback
          data = get_cached_data()

- control: Bulkheads
  effectiveness: HIGH
  implementation:
    - Separate thread pools per dependency
    - Resource isolation per service
    - Limit concurrent requests
    code_example: |
      from concurrent.futures import ThreadPoolExecutor

      # Separate pools for different dependencies
      db_pool = ThreadPoolExecutor(max_workers=20)
      api_pool = ThreadPoolExecutor(max_workers=10)
      cache_pool = ThreadPoolExecutor(max_workers=5)

- control: Timeouts
  effectiveness: CRITICAL
  implementation:
    - Set aggressive timeouts (1-5 seconds)
    - Timeout at every network boundary
    - Cancel long-running operations
    code_example: |
      import requests

      response = requests.get(
          'http://service-b/api/data',
          timeout=(2, 5)  # (connection, read) timeout
      )

- control: Rate limiting per service
  effectiveness: HIGH
  implementation:
    - Limit requests to each dependency
    - Protect downstream services
    - Graceful degradation
    code_example: |
      apiVersion: networking.istio.io/v1alpha3
      kind: DestinationRule
      metadata:
        name: service-b-circuit-breaker
      spec:
        host: service-b
        trafficPolicy:
          connectionPool:
            tcp:
              maxConnections: 100
            http:
              http1MaxPendingRequests: 10
              http2MaxRequests: 100
              maxRequestsPerConnection: 2
          outlierDetection:
            consecutiveErrors: 5
            interval: 30s
            baseEjectionTime: 30s
```

**Threat 5.2: Resource Exhaustion**

- **Description:** Attacker or bug causes service to consume excessive CPU, memory, or connections
- **Attack Vector:** Unbounded loops, memory leaks, connection pool exhaustion
- **Impact:** HIGH - Service crashes, degraded performance
- **Likelihood:** HIGH - Common in production systems

**Mitigations:**

```yaml
- control: Resource limits and requests
  effectiveness: CRITICAL
  implementation:
    platform: Kubernetes
    configuration:
      - Set CPU and memory limits
      - Define resource requests
      - Use LimitRanges and ResourceQuotas
    code_example: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: service-a
      spec:
        containers:
          - name: service-a
            resources:
              requests:
                memory: "256Mi"
                cpu: "250m"
              limits:
                memory: "512Mi"
                cpu: "500m"

- control: Horizontal Pod Autoscaling
  effectiveness: HIGH
  implementation:
    - Scale based on CPU/memory metrics
    - Custom metrics (request queue depth)
    - Minimum and maximum replicas
    code_example: |
      apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      metadata:
        name: service-a-hpa
      spec:
        scaleTargetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: service-a
        minReplicas: 3
        maxReplicas: 10
        metrics:
          - type: Resource
            resource:
              name: cpu
              target:
                type: Utilization
                averageUtilization: 70

- control: Connection pooling
  effectiveness: MEDIUM
  implementation:
    - Limit database connections per instance
    - Reuse connections
    - Close idle connections
    code_example: |
      from sqlalchemy import create_engine

      engine = create_engine(
          database_url,
          pool_size=10,
          max_overflow=20,
          pool_timeout=30,
          pool_recycle=3600
      )

- control: Memory leak detection
  effectiveness: MEDIUM
  implementation:
    - Monitor memory usage trends
    - Alert on gradual memory increase
    - Automatic pod restart on memory threshold
```

**Threat 5.3: Message Queue Flooding**

- **Description:** Attacker floods message queue with messages, overwhelming consumers
- **Attack Vector:** Compromised publisher, malicious internal service, bug causing message loop
- **Impact:** HIGH - Queue backup, consumer crashes, message loss
- **Likelihood:** MEDIUM - Can happen accidentally or maliciously

**Mitigations:**

```yaml
- control: Message rate limiting
  effectiveness: HIGH
  implementation:
    - Limit messages per producer
    - Queue size limits
    - Dead letter queue for poison messages
    code_example: |
      # RabbitMQ queue limits
      channel.queue_declare(
          queue='orders',
          arguments={
              'x-max-length': 10000,
              'x-overflow': 'reject-publish',
              'x-message-ttl': 3600000  # 1 hour
          }
      )

- control: Consumer backpressure
  effectiveness: HIGH
  implementation:
    - Consumer acknowledges only when processed
    - Prefetch limit to prevent overwhelming consumer
    - Reject messages if consumer overloaded
    code_example: |
      channel.basic_qos(prefetch_count=10)

      def callback(ch, method, properties, body):
          try:
              process_message(body)
              ch.basic_ack(delivery_tag=method.delivery_tag)
          except Exception:
              ch.basic_nack(
                  delivery_tag=method.delivery_tag,
                  requeue=False
              )

- control: Message validation
  effectiveness: MEDIUM
  implementation:
    - Reject malformed messages
    - Size limits on messages
    - Schema validation
    - Publisher authentication

- control: Dead letter queue
  effectiveness: HIGH
  implementation:
    - Route failed messages to DLQ
    - Analyze patterns in DLQ
    - Alert on DLQ growth
```

### Elevation of Privilege

**Threat 6.1: Container Escape**

- **Description:** Attacker escapes container to gain access to host or other containers
- **Attack Vector:** Kernel vulnerabilities, privileged containers, host path mounts
- **Impact:** CRITICAL - Full cluster compromise, data breach across all services
- **Likelihood:** LOW - Requires specific vulnerabilities or misconfigurations

**Mitigations:**

```yaml
- control: Security Context constraints
  effectiveness: CRITICAL
  implementation:
    - Run containers as non-root
    - Drop all capabilities
    - Read-only root filesystem
    - No privileged mode
    code_example: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: service-a
      spec:
        securityContext:
          runAsNonRoot: true
          runAsUser: 10000
          fsGroup: 10000
          seccompProfile:
            type: RuntimeDefault
        containers:
          - name: service-a
            securityContext:
              allowPrivilegeEscalation: false
              readOnlyRootFilesystem: true
              capabilities:
                drop:
                  - ALL

- control: Pod Security Standards
  effectiveness: HIGH
  implementation:
    - Enforce Restricted PSS in production
    - Use Pod Security Admission controller
    - Block non-compliant pods
    code_example: |
      apiVersion: v1
      kind: Namespace
      metadata:
        name: production
        labels:
          pod-security.kubernetes.io/enforce: restricted
          pod-security.kubernetes.io/audit: restricted
          pod-security.kubernetes.io/warn: restricted

- control: Runtime security monitoring
  effectiveness: HIGH
  implementation:
    tool: Falco, Aqua, Sysdig
    detection:
      - Unexpected process execution
      - File system modifications
      - Network connections
      - Privilege escalation attempts
    code_example: |
      # Falco rule
      - rule: Unexpected outbound connection
        desc: Detect unexpected outbound network connection
        condition: >
          outbound and container and
          not proc.name in (expected_programs)
        output: >
          Unexpected outbound connection
          (user=%user.name command=%proc.cmdline connection=%fd.name)
        priority: WARNING

- control: Kernel hardening
  effectiveness: MEDIUM
  implementation:
    - Use hardened kernel
    - Enable SELinux/AppArmor
    - Disable unused kernel modules
    - Regular kernel patching
```

**Threat 6.2: Kubernetes RBAC Bypass**

- **Description:** Service obtains excessive Kubernetes API permissions
- **Attack Vector:** Overly permissive ServiceAccount, default ServiceAccount usage, RBAC misconfiguration
- **Impact:** HIGH - Cluster-wide access, secret theft, pod manipulation
- **Likelihood:** MEDIUM - Common RBAC misconfigurations

**Mitigations:**

```yaml
- control: Least privilege ServiceAccounts
  effectiveness: CRITICAL
  implementation:
    - Dedicated ServiceAccount per service
    - Minimal required permissions only
    - Never use default ServiceAccount
    code_example: |
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: service-a-sa
        namespace: production
      automountServiceAccountToken: true
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: Role
      metadata:
        name: service-a-role
        namespace: production
      rules:
        - apiGroups: [""]
          resources: ["configmaps"]
          verbs: ["get", "list"]
          resourceNames: ["service-a-config"]
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
        name: service-a-binding
        namespace: production
      subjects:
        - kind: ServiceAccount
          name: service-a-sa
          namespace: production
      roleRef:
        kind: Role
        name: service-a-role
        apiGroup: rbac.authorization.k8s.io

- control: Disable ServiceAccount token automounting
  effectiveness: HIGH
  implementation:
    - Set automountServiceAccountToken: false
    - Only mount when Kubernetes API access needed
    - Use projected volumes for fine-grained control

- control: RBAC audit
  effectiveness: MEDIUM
  implementation:
    tool: rbac-lookup, kubectl-who-can
    process:
      - Regular RBAC permission reviews
      - Identify overly permissive roles
      - Remove unused ServiceAccounts

- control: Admission controller validation
  effectiveness: HIGH
  implementation:
    - OPA policy to enforce RBAC best practices
    - Block pods using default ServiceAccount
    - Require explicit RBAC bindings
```

**Threat 6.3: Supply Chain Compromise**

- **Description:** Malicious code injected through compromised dependencies
- **Attack Vector:** Compromised npm/pip package, typosquatting, dependency confusion
- **Impact:** CRITICAL - Backdoor installation, data exfiltration, ransomware
- **Likelihood:** MEDIUM - Increasing trend in supply chain attacks

**Mitigations:**

```yaml
- control: Dependency scanning
  effectiveness: HIGH
  implementation:
    tool: Snyk, Dependabot, Renovate
    process:
      - Scan dependencies in CI/CD
      - Block high/critical vulnerabilities
      - Automated security updates
    code_example: |
      # GitHub Dependabot config
      version: 2
      updates:
        - package-ecosystem: "pip"
          directory: "/"
          schedule:
            interval: "weekly"
          open-pull-requests-limit: 10
          reviewers:
            - "security-team"

- control: Software Bill of Materials (SBOM)
  effectiveness: HIGH
  implementation:
    tool: Syft, CycloneDX
    process:
      - Generate SBOM for each image
      - Track all dependencies
      - Vulnerability correlation
    code_example: |
      # Generate SBOM
      syft packages gcr.io/project/service-a:v1.2.3 \
        -o cyclonedx-json > sbom.json

- control: Private package registry
  effectiveness: HIGH
  implementation:
    - Use Artifactory, Nexus, or cloud registry
    - Proxy and cache public packages
    - Scan packages before caching
    - Block unapproved packages

- control: Vendor dependency pinning
  effectiveness: MEDIUM
  implementation:
    - Pin exact versions (not ranges)
    - Use lock files (package-lock.json, Pipfile.lock)
    - Review all dependency updates
    code_example: |
      # requirements.txt with pinned versions
      flask==2.3.2
      sqlalchemy==2.0.18
      pyjwt==2.8.0

- control: Code signing and verification
  effectiveness: HIGH
  implementation:
    - Verify package signatures
    - Use trusted registries only
    - Checksum verification
```

## Threat Risk Matrix

| Threat | Impact | Likelihood | Risk | Priority |
|--------|--------|------------|------|----------|
| Service Impersonation | CRITICAL | MEDIUM | CRITICAL | P0 |
| Container Image Tampering | CRITICAL | MEDIUM | CRITICAL | P0 |
| Container Escape | CRITICAL | LOW | HIGH | P1 |
| Secrets in Images | CRITICAL | HIGH | CRITICAL | P0 |
| Database Injection | CRITICAL | MEDIUM | CRITICAL | P0 |
| Kubernetes RBAC Bypass | HIGH | MEDIUM | HIGH | P1 |
| Supply Chain Compromise | CRITICAL | MEDIUM | CRITICAL | P0 |
| Cascading Failures | CRITICAL | MEDIUM | CRITICAL | P0 |
| API Token Theft | HIGH | HIGH | HIGH | P1 |
| Traffic Eavesdropping | HIGH | MEDIUM | HIGH | P1 |
| Log Data Exposure | HIGH | HIGH | HIGH | P1 |
| Resource Exhaustion | HIGH | HIGH | HIGH | P1 |
| Message Queue Tampering | HIGH | MEDIUM | MEDIUM | P2 |
| Configuration Tampering | HIGH | MEDIUM | MEDIUM | P2 |
| Message Queue Flooding | HIGH | MEDIUM | MEDIUM | P2 |
| Tracing Blind Spots | MEDIUM | HIGH | MEDIUM | P2 |
| Service Call Denial | MEDIUM | LOW | LOW | P3 |

## Security Controls Summary

**Critical (P0):**
- Mutual TLS (mTLS) for all service-to-service communication
- Image signing and admission control
- Secrets management system (Vault)
- Container security contexts (non-root, read-only, no privileges)
- Parameterized queries at every service
- Circuit breakers and timeouts
- RBAC least privilege
- Dependency scanning and SBOM

**High Priority (P1):**
- Short-lived tokens with rotation
- Network policies and segmentation
- Distributed tracing
- Log sanitization
- Resource limits and autoscaling
- Runtime security monitoring (Falco)
- Private package registry

**Recommended (P2):**
- Message signing
- Configuration encryption
- Structured logging with trace context
- Connection pooling
- Dead letter queues
- RBAC audit tools

## Implementation Checklist

- [ ] Deploy service mesh (Istio/Linkerd) with strict mTLS
- [ ] Configure image signing and admission controller
- [ ] Deploy HashiCorp Vault for secrets management
- [ ] Set Pod Security Standards to Restricted
- [ ] Implement least privilege RBAC for all services
- [ ] Deploy distributed tracing (OpenTelemetry)
- [ ] Configure structured logging with sanitization
- [ ] Implement circuit breakers for all service calls
- [ ] Set resource limits on all pods
- [ ] Deploy horizontal pod autoscalers
- [ ] Configure network policies
- [ ] Enable Kubernetes audit logging
- [ ] Deploy runtime security monitoring (Falco)
- [ ] Implement dependency scanning in CI/CD
- [ ] Generate SBOMs for all images
- [ ] Configure message queue encryption
- [ ] Set up centralized log aggregation (ELK/Splunk)
- [ ] Implement rate limiting at API gateway and service mesh
- [ ] Deploy monitoring and alerting (Prometheus/Grafana)
- [ ] Conduct regular penetration testing

## References

- [OWASP Microservices Security](https://owasp.org/www-project-microservices-security/)
- [NIST SP 800-204: Security Strategies for Microservices](https://csrc.nist.gov/publications/detail/sp/800-204/final)
- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/security-best-practices/)
- [CNCF Cloud Native Security Whitepaper](https://www.cncf.io/wp-content/uploads/2020/11/CNCF_Cloud_Native_Security_Whitepaper_Nov_2020.pdf)
- [Container Security Guide (NIST SP 800-190)](https://csrc.nist.gov/publications/detail/sp/800-190/final)

```

### examples/architectures/zero-trust-network.md

```markdown
# Zero Trust Network Architecture

## Overview

Zero Trust security architecture eliminates implicit trust based on network location. Verify every access request based on identity, device posture, and context regardless of origin. Enforce least privilege access with continuous verification and monitoring.

## Core Principles

1. **Never Trust, Always Verify** - Authenticate and authorize every request
2. **Assume Breach** - Minimize blast radius through segmentation
3. **Verify Explicitly** - Use all available data points (identity, location, device, data classification)
4. **Least Privilege Access** - Just-in-time and just-enough access
5. **Microsegmentation** - Isolate workloads and limit lateral movement
6. **End-to-End Encryption** - Encrypt data in transit and at rest

## Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        ZERO TRUST CONTROL PLANE                          │
│                                                                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐│
│  │   Identity   │  │    Device    │  │   Policy     │  │  Telemetry   ││
│  │   Provider   │  │  Management  │  │   Engine     │  │  & Analytics ││
│  │              │  │              │  │              │  │              ││
│  │ - SSO/SAML   │  │ - MDM/UEM    │  │ - RBAC       │  │ - SIEM       ││
│  │ - MFA        │  │ - Posture    │  │ - ABAC       │  │ - Behavioral ││
│  │ - Context    │  │ - Compliance │  │ - Risk Score │  │ - Forensics  ││
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘│
│         │                 │                 │                 │         │
│         └─────────────────┴─────────────────┴─────────────────┘         │
│                                   │                                      │
└───────────────────────────────────┼──────────────────────────────────────┘
                                    │
                    ┌───────────────┴───────────────┐
                    │                               │
                    ▼                               ▼
        ┌──────────────────────┐        ┌──────────────────────┐
        │  Policy Enforcement  │        │  Policy Enforcement  │
        │  Point (PEP)         │        │  Point (PEP)         │
        │                      │        │                      │
        │  - Reverse Proxy     │        │  - Service Mesh      │
        │  - API Gateway       │        │  - Sidecar Proxy     │
        │  - Load Balancer     │        │  - eBPF              │
        └──────────┬───────────┘        └──────────┬───────────┘
                   │                               │
                   ▼                               ▼
        ┌──────────────────────┐        ┌──────────────────────┐
        │   External Users     │        │  Internal Services   │
        │                      │        │                      │
        │  Remote Workers ─────┼───────▶│  Microservices       │
        │  Partners            │        │  Databases           │
        │  Contractors         │        │  APIs                │
        └──────────────────────┘        └──────────────────────┘

                    TRUST BOUNDARIES ELIMINATED
                    ═════════════════════════════

     Every request verified │ Identity-based access │ Continuous monitoring
```

## Identity-Based Access Control

### Single Sign-On (SSO) Integration

**SAML Authentication Flow:**

```
┌──────┐                ┌──────────┐              ┌─────────┐
│User  │                │ Identity │              │Resource │
│      │                │ Provider │              │ (App)   │
└──┬───┘                └────┬─────┘              └────┬────┘
   │                         │                         │
   │ 1. Access Resource      │                         │
   ├────────────────────────────────────────────────────>
   │                         │                         │
   │ 2. Redirect to IdP      │                         │
   <─────────────────────────────────────────────────────
   │                         │                         │
   │ 3. Authenticate         │                         │
   ├────────────────────────>│                         │
   │                         │                         │
   │ 4. MFA Challenge        │                         │
   <─────────────────────────┤                         │
   │                         │                         │
   │ 5. MFA Response         │                         │
   ├────────────────────────>│                         │
   │                         │                         │
   │                         │ 6. Check Device Posture │
   │                         ├────────┐                │
   │                         │        │                │
   │                         <────────┘                │
   │                         │                         │
   │                         │ 7. Evaluate Risk Score  │
   │                         ├────────┐                │
   │                         │        │                │
   │                         <────────┘                │
   │                         │                         │
   │ 8. SAML Assertion       │                         │
   <─────────────────────────┤                         │
   │                         │                         │
   │ 9. Present Assertion    │                         │
   ├────────────────────────────────────────────────────>
   │                         │                         │
   │                         │  10. Validate Assertion │
   │                         │    <────────────────────┤
   │                         │                         │
   │ 11. Access Granted      │                         │
   <─────────────────────────────────────────────────────
```

**SAML Configuration:**

```xml
<saml:AttributeStatement>
  <saml:Attribute Name="email">
    <saml:AttributeValue>[email protected]</saml:AttributeValue>
  </saml:Attribute>
  <saml:Attribute Name="groups">
    <saml:AttributeValue>Engineering</saml:AttributeValue>
    <saml:AttributeValue>ProductionAccess</saml:AttributeValue>
  </saml:Attribute>
  <saml:Attribute Name="device_id">
    <saml:AttributeValue>device-12345</saml:AttributeValue>
  </saml:Attribute>
  <saml:Attribute Name="device_compliant">
    <saml:AttributeValue>true</saml:AttributeValue>
  </saml:Attribute>
  <saml:Attribute Name="ip_address">
    <saml:AttributeValue>203.0.113.45</saml:AttributeValue>
  </saml:Attribute>
  <saml:Attribute Name="risk_score">
    <saml:AttributeValue>low</saml:AttributeValue>
  </saml:Attribute>
</saml:AttributeStatement>
```

### Multi-Factor Authentication (MFA)

**Adaptive MFA Policy:**

```json
{
  "policy_name": "adaptive_mfa",
  "conditions": [
    {
      "name": "always_require_mfa",
      "rule": "user.role in ['admin', 'privileged_user']",
      "action": {
        "mfa_required": true,
        "allowed_factors": ["webauthn", "totp", "push"],
        "step_up_required": true
      }
    },
    {
      "name": "risk_based_mfa",
      "rule": "risk_score > 50 OR ip_reputation == 'suspicious'",
      "action": {
        "mfa_required": true,
        "allowed_factors": ["webauthn"],
        "max_attempts": 3
      }
    },
    {
      "name": "new_device_mfa",
      "rule": "device.first_seen < 7d",
      "action": {
        "mfa_required": true,
        "allowed_factors": ["webauthn", "totp", "push"],
        "device_enrollment_required": true
      }
    },
    {
      "name": "location_based_mfa",
      "rule": "geo.country NOT IN allowed_countries",
      "action": {
        "mfa_required": true,
        "allowed_factors": ["webauthn"],
        "admin_notification": true
      }
    }
  ]
}
```

### Context-Aware Access

**Access Decision Factors:**

```python
class AccessDecisionEngine:
    """
    Evaluate access requests based on multiple context factors.
    """

    def evaluate_access(self, request):
        """
        Calculate risk score and make access decision.
        """
        risk_score = 0
        factors = []

        # Identity verification
        if not request.user.mfa_verified:
            risk_score += 50
            factors.append("MFA_NOT_VERIFIED")

        # Device posture
        if not request.device.is_compliant():
            risk_score += 30
            factors.append("DEVICE_NON_COMPLIANT")

        if not request.device.is_managed():
            risk_score += 20
            factors.append("DEVICE_UNMANAGED")

        if not request.device.has_latest_os():
            risk_score += 15
            factors.append("OS_OUTDATED")

        # Network context
        if request.ip_address.is_tor_exit_node():
            risk_score += 40
            factors.append("TOR_EXIT_NODE")

        if request.ip_address.reputation == "suspicious":
            risk_score += 35
            factors.append("SUSPICIOUS_IP")

        if request.geo_location.country not in self.allowed_countries:
            risk_score += 25
            factors.append("DISALLOWED_COUNTRY")

        # Behavioral analysis
        if self.is_impossible_travel(request):
            risk_score += 60
            factors.append("IMPOSSIBLE_TRAVEL")

        if self.is_anomalous_access_pattern(request):
            risk_score += 40
            factors.append("ANOMALOUS_PATTERN")

        # Time-based risk
        if not self.is_business_hours(request):
            risk_score += 10
            factors.append("AFTER_HOURS")

        # Resource sensitivity
        if request.resource.classification == "highly_confidential":
            risk_score += 20
            factors.append("SENSITIVE_RESOURCE")

        # Make decision
        decision = self.make_decision(risk_score, factors)

        return {
            "allowed": decision["allowed"],
            "risk_score": risk_score,
            "factors": factors,
            "action": decision["action"],
            "reason": decision["reason"]
        }

    def make_decision(self, risk_score, factors):
        """
        Determine access based on risk score.
        """
        if risk_score >= 80:
            return {
                "allowed": False,
                "action": "DENY",
                "reason": "Risk score too high"
            }
        elif risk_score >= 50:
            return {
                "allowed": True,
                "action": "ALLOW_WITH_STEP_UP",
                "reason": "Requires additional verification"
            }
        elif risk_score >= 30:
            return {
                "allowed": True,
                "action": "ALLOW_WITH_MONITORING",
                "reason": "Elevated monitoring required"
            }
        else:
            return {
                "allowed": True,
                "action": "ALLOW",
                "reason": "Normal access granted"
            }
```

## Device Posture Verification

### Device Trust Requirements

**Compliance Checks:**

```yaml
device_posture_policy:
  name: "corporate_device_policy"
  requirements:
    - check: "device_managed"
      description: "Device must be enrolled in MDM"
      severity: "critical"
      required: true

    - check: "encryption_enabled"
      description: "Full disk encryption required"
      severity: "critical"
      required: true

    - check: "os_version"
      description: "Operating system must be up to date"
      severity: "high"
      required: true
      min_versions:
        windows: "10.0.19044"
        macos: "13.0"
        ios: "16.0"
        android: "13.0"

    - check: "antivirus_running"
      description: "Antivirus must be active and updated"
      severity: "high"
      required: true
      max_definition_age: "7d"

    - check: "firewall_enabled"
      description: "Host firewall must be enabled"
      severity: "medium"
      required: true

    - check: "screen_lock"
      description: "Screen lock must be configured"
      severity: "medium"
      required: true
      max_idle_time: "10m"

    - check: "password_policy"
      description: "Strong password required"
      severity: "high"
      required: true
      min_length: 14
      complexity: true

    - check: "unauthorized_apps"
      description: "No blacklisted applications"
      severity: "high"
      required: true
      blacklist:
        - "remote_access_tools"
        - "file_sharing_apps"
        - "cryptocurrency_miners"

    - check: "certificate_valid"
      description: "Valid device certificate"
      severity: "critical"
      required: true
      max_age: "365d"
```

### Continuous Verification

```python
import asyncio
from datetime import datetime, timedelta

class DevicePostureMonitor:
    """
    Continuously monitor device posture and revoke access if non-compliant.
    """

    def __init__(self, check_interval=300):
        self.check_interval = check_interval  # 5 minutes
        self.active_sessions = {}

    async def monitor_session(self, session_id, device_id):
        """
        Monitor device posture for an active session.
        """
        while session_id in self.active_sessions:
            try:
                # Check device posture
                posture = await self.check_device_posture(device_id)

                if not posture["compliant"]:
                    # Device is no longer compliant
                    await self.revoke_session(
                        session_id,
                        reason=f"Device non-compliant: {posture['violations']}"
                    )
                    break

                # Check for certificate expiry
                cert_expiry = await self.get_certificate_expiry(device_id)
                if cert_expiry < datetime.utcnow() + timedelta(hours=1):
                    await self.notify_renewal_required(device_id)

                # Sleep until next check
                await asyncio.sleep(self.check_interval)

            except Exception as e:
                # Error checking posture - assume non-compliant
                await self.revoke_session(
                    session_id,
                    reason=f"Unable to verify device posture: {str(e)}"
                )
                break

    async def check_device_posture(self, device_id):
        """
        Query MDM/UEM for current device compliance status.
        """
        # Integration with MDM (Intune, Workspace ONE, Jamf, etc.)
        compliance_status = await self.mdm_client.get_compliance(device_id)

        violations = []
        if not compliance_status["encryption_enabled"]:
            violations.append("DISK_NOT_ENCRYPTED")
        if not compliance_status["av_running"]:
            violations.append("ANTIVIRUS_DISABLED")
        if compliance_status["os_outdated"]:
            violations.append("OS_VERSION_OUTDATED")

        return {
            "compliant": len(violations) == 0,
            "violations": violations,
            "last_checked": datetime.utcnow()
        }

    async def revoke_session(self, session_id, reason):
        """
        Immediately terminate non-compliant session.
        """
        session = self.active_sessions.pop(session_id, None)
        if session:
            # Revoke tokens
            await self.identity_provider.revoke_tokens(session["tokens"])

            # Notify user
            await self.notification_service.send(
                to=session["user_email"],
                subject="Session Terminated - Device Non-Compliant",
                body=f"Your session was terminated: {reason}"
            )

            # Log event
            await self.audit_log.record({
                "event": "SESSION_REVOKED",
                "session_id": session_id,
                "reason": reason,
                "timestamp": datetime.utcnow()
            })
```

## Microsegmentation

### Service-to-Service Communication

**Identity-Based Segmentation:**

```yaml
# Service Mesh Policy (Istio example)
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: frontend-policy
  namespace: production
spec:
  selector:
    matchLabels:
      app: frontend
  action: ALLOW
  rules:
    # Only API gateway can call frontend
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/api-gateway"]
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/*"]
      when:
        - key: request.auth.claims[iss]
          values: ["https://auth.example.com"]
        - key: request.auth.claims[aud]
          values: ["frontend-service"]

---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
    # Only frontend can call backend
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/frontend"]
      to:
        - operation:
            methods: ["GET", "POST", "PUT", "DELETE"]
            paths: ["/api/v1/*"]
      when:
        - key: source.namespace
          values: ["production"]

---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: database-policy
  namespace: production
spec:
  selector:
    matchLabels:
      app: database-proxy
  action: ALLOW
  rules:
    # Only backend can access database
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/backend"]
      to:
        - operation:
            ports: ["5432"]
      when:
        - key: connection.sni
          values: ["postgres.production.svc.cluster.local"]
```

### Network-Level Segmentation

**Zero Trust Network Zones:**

```
┌─────────────────────────────────────────────────────────────┐
│                     DMZ Zone                                 │
│  - WAF / Reverse Proxy                                       │
│  - DDoS Protection                                           │
│  - No direct internet access to apps                         │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        │ mTLS required
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                 Application Zone                             │
│  - Stateless services                                        │
│  - No persistent data                                        │
│  - Identity-based access only                                │
│  - East-west traffic encrypted                               │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        │ JWT + mTLS required
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                   Data Zone                                  │
│  - Databases                                                 │
│  - Object storage                                            │
│  - No ingress from internet                                  │
│  - Encryption at rest enforced                               │
└─────────────────────────────────────────────────────────────┘
```

**eBPF-Based Segmentation (Cilium):**

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: l7-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: "GET"
                path: "/api/data"
              - method: "POST"
                path: "/api/data"
                headerMatches:
                  - mismatch: EQUAL
                    name: X-API-Key
                    secret:
                      name: api-keys
                      namespace: production

  egress:
    - toEndpoints:
        - matchLabels:
            app: database
      toPorts:
        - ports:
            - port: "5432"
              protocol: TCP

    - toFQDNs:
        - matchName: "api.external-service.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
```

## Just-In-Time Access

### Temporary Privilege Escalation

**Access Request Workflow:**

```python
from datetime import datetime, timedelta
import uuid

class JITAccessManager:
    """
    Manage just-in-time privileged access requests.
    """

    def request_access(self, user_id, resource, role, duration_hours,
                      justification, ticket_id=None):
        """
        Request temporary elevated access.
        """
        # Validate request
        if duration_hours > 8:
            raise ValueError("Maximum access duration is 8 hours")

        if not justification or len(justification) < 20:
            raise ValueError("Detailed justification required")

        # Create access request
        request_id = str(uuid.uuid4())
        request = {
            "request_id": request_id,
            "user_id": user_id,
            "resource": resource,
            "role": role,
            "duration": duration_hours,
            "justification": justification,
            "ticket_id": ticket_id,
            "requested_at": datetime.utcnow(),
            "status": "PENDING_APPROVAL",
            "approvers": self.get_required_approvers(resource, role)
        }

        # Store request
        self.db.save_request(request)

        # Notify approvers
        self.notification_service.notify_approvers(
            approvers=request["approvers"],
            request=request
        )

        return request_id

    def approve_request(self, request_id, approver_id, approval_note):
        """
        Approve access request and grant temporary permissions.
        """
        request = self.db.get_request(request_id)

        if request["status"] != "PENDING_APPROVAL":
            raise ValueError("Request is not pending approval")

        if approver_id not in request["approvers"]:
            raise ValueError("User is not authorized to approve this request")

        # Grant access
        expiry = datetime.utcnow() + timedelta(hours=request["duration"])

        access_grant = {
            "grant_id": str(uuid.uuid4()),
            "request_id": request_id,
            "user_id": request["user_id"],
            "resource": request["resource"],
            "role": request["role"],
            "granted_at": datetime.utcnow(),
            "expires_at": expiry,
            "approver_id": approver_id,
            "approval_note": approval_note
        }

        # Apply IAM policy
        self.iam_service.grant_temporary_role(
            user_id=request["user_id"],
            resource=request["resource"],
            role=request["role"],
            expiry=expiry
        )

        # Update request status
        request["status"] = "APPROVED"
        request["approved_at"] = datetime.utcnow()
        request["access_grant"] = access_grant
        self.db.update_request(request)

        # Schedule automatic revocation
        self.scheduler.schedule_revocation(
            grant_id=access_grant["grant_id"],
            revoke_at=expiry
        )

        # Notify requester
        self.notification_service.notify_user(
            user_id=request["user_id"],
            subject="Access Request Approved",
            body=f"Access granted until {expiry}"
        )

        # Audit log
        self.audit_log.record({
            "event": "JIT_ACCESS_GRANTED",
            "request_id": request_id,
            "grant_id": access_grant["grant_id"],
            "user_id": request["user_id"],
            "resource": request["resource"],
            "role": request["role"],
            "approver_id": approver_id,
            "expires_at": expiry
        })

        return access_grant

    def revoke_access(self, grant_id, reason="EXPIRED"):
        """
        Revoke temporary access.
        """
        grant = self.db.get_grant(grant_id)

        # Remove IAM permissions
        self.iam_service.revoke_role(
            user_id=grant["user_id"],
            resource=grant["resource"],
            role=grant["role"]
        )

        # Update grant status
        grant["revoked_at"] = datetime.utcnow()
        grant["revocation_reason"] = reason
        self.db.update_grant(grant)

        # Audit log
        self.audit_log.record({
            "event": "JIT_ACCESS_REVOKED",
            "grant_id": grant_id,
            "user_id": grant["user_id"],
            "reason": reason
        })
```

### Break-Glass Access

**Emergency Access Procedure:**

```yaml
break_glass_policy:
  name: "emergency_access"
  description: "Break-glass access for critical incidents"

  accounts:
    - account_id: "breakglass-001"
      stored_in: "physical_safe"
      rotation_frequency: "quarterly"
      permissions: ["full_admin"]

    - account_id: "breakglass-002"
      stored_in: "physical_safe"
      rotation_frequency: "quarterly"
      permissions: ["full_admin"]

  activation_triggers:
    - "Major security incident (P0)"
    - "Complete system failure"
    - "All admin accounts locked"
    - "Identity provider outage"

  activation_procedure:
    1: "Incident commander declares break-glass event"
    2: "Retrieve credentials from physical safe (requires two keyholders)"
    3: "Log break-glass activation in offline system"
    4: "Authenticate with break-glass account"
    5: "All actions logged to immutable audit trail"
    6: "Notify security team immediately"
    7: "Mandatory post-incident review within 24 hours"

  automatic_alerts:
    - channel: "security_team_pager"
      severity: "CRITICAL"
    - channel: "exec_team_email"
      severity: "HIGH"
    - channel: "siem"
      severity: "CRITICAL"

  post_activation:
    - "Rotate break-glass credentials within 4 hours"
    - "Review all actions taken"
    - "Root cause analysis within 48 hours"
    - "Update incident response procedures"
```

## Continuous Verification

### Session Monitoring

```python
class SessionMonitor:
    """
    Monitor active sessions for anomalous behavior.
    """

    def monitor_session(self, session_id):
        """
        Continuously analyze session for suspicious activity.
        """
        session = self.get_session(session_id)
        baseline = self.get_user_baseline(session["user_id"])

        anomalies = []

        # Check access patterns
        recent_resources = self.get_recent_resource_access(session_id)
        if self.is_unusual_resource_access(recent_resources, baseline):
            anomalies.append({
                "type": "UNUSUAL_RESOURCE_ACCESS",
                "severity": "MEDIUM",
                "details": "Accessing resources outside normal pattern"
            })

        # Check data volume
        data_transferred = self.get_data_transfer_volume(session_id)
        if data_transferred > baseline["avg_data_transfer"] * 10:
            anomalies.append({
                "type": "EXCESSIVE_DATA_TRANSFER",
                "severity": "HIGH",
                "details": f"Transferred {data_transferred}MB (baseline: {baseline['avg_data_transfer']}MB)"
            })

        # Check geographic location changes
        current_location = self.get_session_location(session_id)
        if self.is_impossible_travel(session["previous_location"], current_location):
            anomalies.append({
                "type": "IMPOSSIBLE_TRAVEL",
                "severity": "CRITICAL",
                "details": "Location changed faster than physically possible"
            })

        # Check API call patterns
        api_calls = self.get_api_call_frequency(session_id)
        if api_calls > baseline["avg_api_calls"] * 5:
            anomalies.append({
                "type": "EXCESSIVE_API_CALLS",
                "severity": "MEDIUM",
                "details": f"API call rate {api_calls}/min (baseline: {baseline['avg_api_calls']}/min)"
            })

        # Take action on anomalies
        if anomalies:
            self.handle_anomalies(session_id, anomalies)

        return anomalies

    def handle_anomalies(self, session_id, anomalies):
        """
        Respond to detected anomalies based on severity.
        """
        max_severity = max(a["severity"] for a in anomalies)

        if max_severity == "CRITICAL":
            # Immediately terminate session
            self.terminate_session(session_id, reason="Critical anomaly detected")
            self.create_security_incident(session_id, anomalies)

        elif max_severity == "HIGH":
            # Require step-up authentication
            self.require_step_up_auth(session_id)
            self.alert_security_team(session_id, anomalies)

        elif max_severity == "MEDIUM":
            # Increase monitoring frequency
            self.increase_monitoring(session_id)
            self.log_suspicious_activity(session_id, anomalies)
```

## Policy Enforcement Points

### API Gateway Enforcement

```yaml
# Kong Gateway Configuration
plugins:
  - name: oidc
    config:
      issuer: "https://auth.example.com"
      client_id: "api-gateway"
      client_secret: "${OIDC_CLIENT_SECRET}"
      scopes:
        - openid
        - profile
        - email
      bearer_only: true
      realm: "production"
      introspection_endpoint: "https://auth.example.com/introspect"

  - name: rate-limiting
    config:
      minute: 100
      hour: 1000
      policy: "cluster"
      fault_tolerant: true

  - name: request-transformer
    config:
      add:
        headers:
          - "X-User-Id:$(claims.sub)"
          - "X-User-Email:$(claims.email)"
          - "X-User-Roles:$(claims.roles)"

  - name: acl
    config:
      allow:
        - "authenticated_users"
      hide_groups_header: true

  - name: correlation-id
    config:
      header_name: "X-Correlation-ID"
      generator: "uuid"
      echo_downstream: true
```

### Reverse Proxy Enforcement (Nginx)

```nginx
server {
    listen 443 ssl http2;
    server_name api.example.com;

    # mTLS enforcement
    ssl_client_certificate /etc/nginx/ca.crt;
    ssl_verify_client on;
    ssl_verify_depth 2;

    # Extract client certificate details
    set $ssl_client_subject_dn_cn "";
    if ($ssl_client_s_dn ~ "CN=([^,]+)") {
        set $ssl_client_subject_dn_cn $1;
    }

    location /api/ {
        # Verify JWT token
        auth_jwt "API Access";
        auth_jwt_key_file /etc/nginx/jwt_public_key.pem;

        # Check required claims
        auth_jwt_claim_set $jwt_email email;
        auth_jwt_claim_set $jwt_roles roles;

        # Rate limiting
        limit_req zone=api_limit burst=20 nodelay;

        # Forward auth context
        proxy_set_header X-Client-Cert-CN $ssl_client_subject_dn_cn;
        proxy_set_header X-JWT-Email $jwt_email;
        proxy_set_header X-JWT-Roles $jwt_roles;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Backend upstream
        proxy_pass https://backend-service;
        proxy_ssl_verify on;
        proxy_ssl_trusted_certificate /etc/nginx/backend-ca.crt;
    }
}
```

## Key Security Metrics

Monitor these metrics for Zero Trust effectiveness:

1. **Authentication Metrics:**
   - MFA adoption rate (target: 100%)
   - Failed authentication attempts
   - Step-up authentication triggers
   - Break-glass account usage (target: 0)

2. **Device Metrics:**
   - Device compliance rate (target: 100%)
   - Certificate expiry warnings
   - Unauthorized device access attempts
   - Average time to remediation

3. **Access Metrics:**
   - JIT access requests per day
   - Average approval time
   - Access denials by reason
   - Excessive privilege usage

4. **Network Metrics:**
   - Microsegmentation policy violations
   - East-west traffic encryption rate (target: 100%)
   - Lateral movement attempts blocked
   - Service-to-service auth failures

5. **Behavioral Metrics:**
   - Anomaly detection alerts
   - Risk score distribution
   - Sessions terminated due to anomalies
   - False positive rate

## Implementation Checklist

- [ ] Deploy identity provider with MFA
- [ ] Integrate device management (MDM/UEM)
- [ ] Configure adaptive authentication policies
- [ ] Implement device posture verification
- [ ] Deploy service mesh for microsegmentation
- [ ] Configure identity-based network policies
- [ ] Implement JIT access system
- [ ] Create break-glass procedures
- [ ] Deploy API gateway with policy enforcement
- [ ] Configure reverse proxy with mTLS
- [ ] Enable end-to-end encryption (mTLS)
- [ ] Deploy SIEM for continuous monitoring
- [ ] Create behavioral analysis baselines
- [ ] Implement session monitoring
- [ ] Configure automated response playbooks
- [ ] Create security metrics dashboard
- [ ] Conduct user training
- [ ] Perform tabletop exercises
- [ ] Document incident response procedures
- [ ] Establish continuous improvement process

## References

- [NIST Zero Trust Architecture (SP 800-207)](https://csrc.nist.gov/publications/detail/sp/800-207/final)
- [Google BeyondCorp](https://cloud.google.com/beyondcorp)
- [Microsoft Zero Trust](https://www.microsoft.com/en-us/security/business/zero-trust)
- [CISA Zero Trust Maturity Model](https://www.cisa.gov/zero-trust-maturity-model)
- [Forrester Zero Trust eXtended (ZTX)](https://www.forrester.com/what-it-means/zero-trust/)

```

architecting-security | SkillHub