Back to skills
SkillHub ClubRun DevOpsFull StackDevOpsSecurity

implementing-service-mesh

Implement production-ready service mesh deployments with Istio, Linkerd, or Cilium. Configure mTLS, authorization policies, traffic routing, and progressive delivery patterns for secure, observable microservices. Use when setting up service-to-service communication, implementing zero-trust security, or enabling canary deployments.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
318
Hot score
99
Updated
March 20, 2026
Overall rating
C4.1
Composite score
4.1
Best-practice grade
B73.6

Install command

npx @skill-hub/cli install ancoleman-ai-design-components-implementing-service-mesh

Repository

ancoleman/ai-design-components

Skill path: skills/implementing-service-mesh

Implement production-ready service mesh deployments with Istio, Linkerd, or Cilium. Configure mTLS, authorization policies, traffic routing, and progressive delivery patterns for secure, observable microservices. Use when setting up service-to-service communication, implementing zero-trust security, or enabling canary deployments.

Open repository

Best for

Primary workflow: Run DevOps.

Technical facets: Full Stack, DevOps, Security.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: ancoleman.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install implementing-service-mesh into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/ancoleman/ai-design-components before adding implementing-service-mesh to shared team environments
  • Use implementing-service-mesh for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: implementing-service-mesh
description: Implement production-ready service mesh deployments with Istio, Linkerd, or Cilium. Configure mTLS, authorization policies, traffic routing, and progressive delivery patterns for secure, observable microservices. Use when setting up service-to-service communication, implementing zero-trust security, or enabling canary deployments.
---

# Service Mesh Implementation

## Purpose

Configure and deploy service mesh infrastructure for Kubernetes environments. Enable secure service-to-service communication with mutual TLS, implement traffic management policies, configure authorization controls, and set up progressive delivery strategies. Abstracts network complexity while providing observability, security, and resilience for microservices.

## When to Use

Invoke this skill when:

- "Set up service mesh with mTLS"
- "Configure Istio traffic routing"
- "Implement canary deployments"
- "Secure microservices communication"
- "Add authorization policies to services"
- "Traffic splitting between versions"
- "Multi-cluster service mesh setup"
- "Configure ambient mode vs sidecar"
- "Set up circuit breaker configuration"
- "Enable distributed tracing"

## Service Mesh Selection

Choose based on requirements and constraints.

**Istio Ambient (Recommended for most):**
- 8% latency overhead with mTLS (vs 166% sidecar mode)
- Enterprise features, multi-cloud, advanced L7 routing
- Sidecar-less L4 (ztunnel) + optional L7 (waypoint)

**Linkerd (Simplicity priority):**
- 33% latency overhead (lowest sidecar)
- Rust-based micro-proxy, automatic mTLS
- Best for small-medium teams, easy adoption

**Cilium (eBPF-native):**
- 99% latency overhead, kernel-level enforcement
- Advanced networking, sidecar-less by design
- Best for eBPF infrastructure, future-proof

For detailed comparison matrix and architecture trade-offs, see `references/decision-tree.md`.

## Core Concepts

### Data Plane Architectures

**Sidecar:** Proxy per pod, fine-grained L7 control, higher overhead
**Sidecar-less:** Shared node proxies (Istio Ambient) or eBPF (Cilium), lower overhead

**Istio Ambient Components:**
- ztunnel: Per-node L4 proxy for mTLS
- waypoint: Optional per-namespace L7 proxy for HTTP routing

### Traffic Management

**Routing:** Path, header, weight-based traffic distribution
**Resilience:** Retries, timeouts, circuit breakers, fault injection
**Load Balancing:** Round robin, least connections, consistent hash

### Security Model

**mTLS:** Automatic encryption, certificate rotation, zero app changes
**Modes:** STRICT (reject plaintext), PERMISSIVE (accept both)
**Authorization:** Default-deny, identity-based (not IP), L7 policies

## Istio Configuration

Istio uses Custom Resource Definitions for traffic management and security.

### VirtualService (Routing)

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-canary
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
        subset: v1
      weight: 90
    - destination:
        host: backend
        subset: v2
      weight: 10
```

### DestinationRule (Traffic Policy)

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend-circuit-breaker
spec:
  host: backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
```

### PeerAuthentication (mTLS)

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
```

### AuthorizationPolicy (Access Control)

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
```

For advanced patterns (fault injection, mirroring, gateways), see `references/istio-patterns.md`.

## Linkerd Configuration

Linkerd emphasizes simplicity with automatic mTLS.

### HTTPRoute (Traffic Splitting)

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-canary
spec:
  parentRefs:
  - name: backend
    kind: Service
  rules:
  - backendRefs:
    - name: backend-v1
      port: 8080
      weight: 90
    - name: backend-v2
      port: 8080
      weight: 10
```

### ServiceProfile (Retries/Timeouts)

```yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: backend.production.svc.cluster.local
spec:
  routes:
  - name: GET /api/data
    condition:
      method: GET
      pathRegex: /api/data
    timeout: 3s
    retryBudget:
      retryRatio: 0.2
      minRetriesPerSecond: 10
```

### AuthorizationPolicy

```yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend
spec:
  targetRef:
    kind: Server
    name: backend-api
  requiredAuthenticationRefs:
  - name: frontend-identity
    kind: MeshTLSAuthentication
```

For complete patterns and mTLS verification, see `references/linkerd-patterns.md`.

## Cilium Configuration

Cilium uses eBPF for kernel-level enforcement.

### CiliumNetworkPolicy (L3/L4/L7)

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: backend-access
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
      rules:
        http:
        - method: GET
          path: "/api/.*"
```

### DNS-Based Egress

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: external-api-access
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  - toFQDNs:
    - matchName: "api.github.com"
    toPorts:
    - ports:
      - port: "443"
```

For mTLS with SPIRE and eBPF patterns, see `references/cilium-patterns.md`.

## Security Implementation

### Zero-Trust Architecture

1. Enable strict mTLS (encrypt all traffic)
2. Default-deny authorization policies
3. Explicit allow rules (least privilege)
4. Identity-based access control
5. Audit logging

**Example (Istio):**

```yaml
# Strict mTLS
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: strict-mtls
  namespace: production
spec:
  mtls:
    mode: STRICT
---
# Deny all by default
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec: {}
```

### Certificate Management

- Automatic rotation (24h TTL default)
- Zero-downtime updates
- External CA integration (cert-manager)
- SPIFFE/SPIRE for workload identity

For JWT authentication and external authorization (OPA), see `references/security-patterns.md`.

## Progressive Delivery

### Canary Deployment

Gradually shift traffic with monitoring.

**Stages:**
1. Deploy v2 with 0% traffic
2. Route 10% to v2, monitor metrics
3. Increase: 25% → 50% → 75% → 100%
4. Cleanup v1 deployment

**Monitor:** Error rate, latency (P95/P99), throughput

### Blue/Green Deployment

Instant cutover with quick rollback.

**Process:**
1. Deploy green alongside blue
2. Test green with header routing
3. Instant cutover to green
4. Rollback to blue if needed

### Automated Rollback (Flagger)

```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: backend
spec:
  targetRef:
    kind: Deployment
    name: backend
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
```

For A/B testing and detailed patterns, see `references/progressive-delivery.md`.

## Multi-Cluster Mesh

Extend mesh across Kubernetes clusters.

**Use Cases:** HA, geo-distribution, compliance, DR

**Istio Multi-Primary:**

```bash
# Install on cluster 1
istioctl install --set values.global.meshID=mesh1 \
  --set values.global.multiCluster.clusterName=cluster1

# Exchange secrets for service discovery
istioctl x create-remote-secret --context=cluster2 | \
  kubectl apply -f - --context=cluster1
```

**Linkerd Multi-Cluster:**

```bash
# Link clusters
linkerd multicluster link --cluster-name cluster2 | \
  kubectl apply -f -

# Export service
kubectl label svc/backend mirror.linkerd.io/exported=true
```

For complete setup and cross-cluster patterns, see `references/multi-cluster.md`.

## Installation

### Istio Ambient Mode

```bash
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=ambient -y
kubectl label namespace production istio.io/dataplane-mode=ambient
```

### Linkerd

```bash
curl -sL https://run.linkerd.io/install-edge | sh
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
kubectl annotate namespace production linkerd.io/inject=enabled
```

### Cilium

```bash
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set meshMode=enabled \
  --set authentication.mutual.spire.enabled=true
```

## Troubleshooting

### mTLS Issues

```bash
# Istio: Check mTLS status
istioctl authn tls-check frontend.production.svc.cluster.local

# Linkerd: Check edges
linkerd edges deployment/frontend -n production

# Cilium: Check auth
cilium bpf auth list
```

### Traffic Routing Issues

```bash
# Istio: Analyze config
istioctl analyze -n production

# Linkerd: Tap traffic
linkerd tap deployment/backend -n production

# Cilium: Observe flows
hubble observe --namespace production
```

For complete debugging guide and solutions, see `references/troubleshooting.md`.

## Integration with Other Skills

**kubernetes-operations:** Cluster setup, namespaces, RBAC
**security-hardening:** Container security, secret management
**infrastructure-as-code:** Terraform/Helm for mesh deployment
**building-ci-pipelines:** Automated canary, integration tests
**performance-engineering:** Latency benchmarking, optimization

## Reference Files

- `references/decision-tree.md` - Service mesh selection and comparison
- `references/istio-patterns.md` - Istio configuration examples
- `references/linkerd-patterns.md` - Linkerd patterns and best practices
- `references/cilium-patterns.md` - Cilium eBPF policies and mTLS
- `references/security-patterns.md` - Zero-trust and authorization
- `references/progressive-delivery.md` - Canary, blue/green, A/B testing
- `references/multi-cluster.md` - Multi-cluster setup and federation
- `references/troubleshooting.md` - Common issues and debugging


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/decision-tree.md

```markdown
# Service Mesh Selection Decision Tree

## Table of Contents

- [Quick Decision Matrix](#quick-decision-matrix)
- [Detailed Comparison Matrix](#detailed-comparison-matrix)
- [Use Case Recommendations](#use-case-recommendations)
- [Sidecar vs Sidecar-less](#sidecar-vs-sidecar-less)
- [Performance Comparison](#performance-comparison)
- [Migration Paths](#migration-paths)
- [Key Decision Factors](#key-decision-factors)
- [Summary Recommendations](#summary-recommendations)

## Quick Decision Matrix

```
START: Need service mesh for Kubernetes?
│
├─→ Priority: Simplicity + Low Overhead + Small Team
│   └─→ **LINKERD**
│       ✓ Lightweight Rust-based micro-proxy
│       ✓ Lowest latency overhead (33% with mTLS)
│       ✓ Automatic mTLS with zero config
│       ✓ Simple installation and operation
│       ✓ Best for: Small-medium teams, easy adoption
│
├─→ Priority: eBPF + Future-Proof + Advanced Networking
│   └─→ **CILIUM**
│       ✓ Sidecar-less by design (eBPF in kernel)
│       ✓ Advanced network policies (L3/L4/L7)
│       ✓ Integrated CNI (replaces kube-proxy)
│       ✓ Kernel-level observability (Hubble)
│       ✓ Best for: eBPF infrastructure, performance-critical
│
└─→ Priority: Enterprise Features + Multi-Cloud + Flexibility
    └─→ **ISTIO**
        ├─→ Sidecar Mode (traditional)
        │   ✓ Fine-grained L7 control per pod
        │   ✓ Mature, battle-tested
        │   ✗ 166% latency overhead with mTLS
        │   ✓ Best for: Complex L7 requirements per service
        │
        └─→ Ambient Mode (modern, recommended)
            ✓ Sidecar-less L4 (ztunnel per-node)
            ✓ Optional L7 (waypoint per-namespace)
            ✓ Only 8% latency overhead with mTLS
            ✓ Lower resource consumption
            ✓ Best for: New deployments, enterprise scale
```

## Detailed Comparison Matrix

| Criteria | Istio Sidecar | Istio Ambient | Linkerd | Cilium |
|----------|---------------|---------------|---------|--------|
| **Architecture** | Sidecar (Envoy) | ztunnel + waypoint | Sidecar (linkerd2-proxy) | eBPF + optional Envoy |
| **Latency Overhead (mTLS)** | 166% | **8%** ⭐ | **33%** ⭐ | 99% |
| **Resource Usage** | High (per-pod) | Low (per-node) | Medium (lightweight proxy) | Low (kernel-level) |
| **L7 Granularity** | Per-pod | Per-namespace (waypoint) | Per-pod | Per-namespace or cluster |
| **Installation Complexity** | Medium | Medium | **Low** ⭐ | High (CNI integration) |
| **Upgrade Complexity** | High (pod restart) | Medium (node-level) | Medium (pod restart) | Low (kernel upgrade) |
| **Multi-Cluster** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| **Advanced Routing** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| **Security (mTLS)** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| **Observability** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| **Community/Ecosystem** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| **Production Maturity** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |

## Use Case Recommendations

### Enterprise Multi-Cloud

**Recommended: Istio Ambient**

- Multi-cluster federation (primary requirement)
- Advanced traffic management (weighted routing, mirroring)
- Enterprise compliance requirements
- Large engineering organization
- Lower overhead than sidecar mode

**Alternative: Istio Sidecar**
- Use if per-pod L7 policies are essential
- Trade higher overhead for fine-grained control

### Startup or Small Team

**Recommended: Linkerd**

- Simplest to install and operate
- Lowest learning curve
- Automatic mTLS with zero configuration
- Best latency overhead for sidecar model
- Small resource footprint

**Alternative: Istio Ambient**
- Choose if expecting rapid growth
- Need for advanced features later

### Performance-Critical Workloads

**Recommended: Linkerd (sidecar)**

- Lowest latency overhead (33%)
- Rust-based proxy (memory-safe, fast)
- Minimal resource consumption
- Simple troubleshooting

**Alternative: Istio Ambient**
- Second-best latency (8%)
- More features if needed

### eBPF-Based Infrastructure

**Recommended: Cilium**

- Native eBPF enforcement at kernel level
- Integrated CNI (replaces kube-proxy)
- Advanced network policies
- Future-proof architecture
- Kernel-level observability

**Note:** Higher latency overhead (99%) vs other options, but unmatched networking features.

### High-Compliance Environments

**Recommended: Istio (Ambient or Sidecar)**

- Comprehensive audit logging
- Advanced authorization policies
- External authorization (OPA, ext-authz)
- Proven in regulated industries
- Enterprise support available

### Gradual Mesh Adoption

**Recommended: Istio Ambient**

- Start with L4 mTLS (ztunnel)
- Add L7 policies (waypoint) only where needed
- Namespace-level granularity
- No sidecar injection required

**Alternative: Linkerd**
- Simple namespace-level injection
- Gradual rollout with permissive mode

## Sidecar vs Sidecar-less

### Sidecar Architecture

**How it Works:**
- Proxy container injected into each pod
- Intercepts all network traffic (iptables or eBPF)
- Per-pod policy enforcement
- Independent proxy configuration

**Advantages:**
- Strong isolation (per-pod)
- Fine-grained L7 control
- Easy debugging (proxy per pod)
- Mature and battle-tested

**Disadvantages:**
- Higher resource usage (CPU/memory per pod)
- Higher latency overhead (extra hop)
- Complex upgrades (pod restarts)
- More moving parts

**When to Use:**
- Need per-pod L7 policies
- Strong isolation required
- Service-specific configurations
- Mature tooling priority

### Sidecar-less Architecture

**Istio Ambient:**
- **ztunnel (L4):** Per-node shared proxy, mTLS enforcement
- **waypoint (L7):** Per-namespace optional proxy for HTTP routing
- Progressive adoption path

**Cilium:**
- eBPF programs in kernel
- No proxies for L3/L4
- Optional Envoy for L7

**Advantages:**
- Lower latency overhead
- Reduced resource consumption
- Simpler operations (fewer containers)
- Easier upgrades (node-level)

**Disadvantages:**
- Less mature (newer technology)
- Weaker isolation (shared components)
- Coarser granularity (namespace vs pod)
- Complex debugging (shared state)

**When to Use:**
- Lower overhead priority
- Simplified operations
- Namespace-level policies acceptable
- Modern infrastructure

## Performance Comparison

### Latency Overhead (with mTLS enabled)

Based on service mesh benchmarks (2025):

| Mesh | Mode | Latency Increase | Baseline (no mesh) |
|------|------|------------------|-------------------|
| **None** | - | 0% | 1.0ms (reference) |
| **Istio** | Ambient | **+8%** | 1.08ms |
| **Linkerd** | Sidecar | **+33%** | 1.33ms |
| **Cilium** | eBPF | **+99%** | 1.99ms |
| **Istio** | Sidecar | **+166%** | 2.66ms |

**Key Insight:** Istio Ambient offers best balance of features and performance.

### Resource Usage

**Per-Pod Overhead (Sidecar):**
- Istio Envoy: ~50MB memory, ~0.1 CPU cores
- Linkerd proxy: ~10MB memory, ~0.01 CPU cores ⭐

**Per-Node Overhead (Sidecar-less):**
- Istio ztunnel: ~100MB memory, ~0.2 CPU cores
- Cilium agent: ~150MB memory, ~0.3 CPU cores

**Calculation Example (100 pods):**
- Istio Sidecar: 5GB memory, 10 CPU cores
- Linkerd: 1GB memory, 1 CPU core
- Istio Ambient: 100MB memory (shared), 0.2 CPU cores
- Cilium: 150MB memory (shared), 0.3 CPU cores

## Migration Paths

### From No Mesh to Mesh

**Recommended Order:**
1. Install mesh control plane
2. Enable mTLS in PERMISSIVE mode (accept plaintext)
3. Inject mesh into non-critical namespaces first
4. Validate connectivity and observability
5. Switch to STRICT mTLS (reject plaintext)
6. Roll out to production namespaces

### From Sidecar to Sidecar-less (Istio)

**Migration to Ambient:**
1. Install Istio with ambient profile
2. Keep existing sidecar namespaces running
3. New namespaces: label with `istio.io/dataplane-mode=ambient`
4. Test ambient mode in staging
5. Gradually migrate namespaces (remove sidecar injection, add ambient label)
6. Remove sidecar injection from all namespaces

### Between Different Meshes

**General Approach:**
1. Install new mesh alongside existing
2. Run both meshes in parallel (different namespaces)
3. Migrate services namespace by namespace
4. Test thoroughly at each stage
5. Remove old mesh when migration complete

**Complexity: High** - Avoid if possible. Choose right mesh initially.

## Key Decision Factors

### Team Size and Expertise

- **Small team (<10 engineers):** Linkerd (simplicity)
- **Medium team (10-50 engineers):** Istio Ambient or Linkerd
- **Large team (>50 engineers):** Istio (features, scale)

### Workload Characteristics

- **Low latency critical:** Linkerd (33% overhead)
- **High throughput:** Istio Ambient (8% overhead)
- **eBPF-based networking:** Cilium (kernel-level)

### Operational Constraints

- **Limited resources:** Sidecar-less (Ambient, Cilium)
- **Complex upgrades problematic:** Sidecar-less preferred
- **Need quick rollbacks:** Sidecar (easier isolation)

### Future Requirements

- **Expecting growth:** Istio (scales to enterprise)
- **Multi-cloud planned:** Istio (best multi-cluster)
- **eBPF investment:** Cilium (future-proof)

## Summary Recommendations

**2025 General Guidance:**

1. **Default choice:** Istio Ambient (best balance of features and performance)
2. **Simplicity priority:** Linkerd (easiest to adopt)
3. **eBPF future:** Cilium (kernel-level networking)
4. **Legacy compatibility:** Istio Sidecar (mature, proven)

**Anti-Patterns:**

- ❌ Don't choose Istio Sidecar for new deployments (use Ambient instead)
- ❌ Don't choose Cilium if team lacks eBPF expertise
- ❌ Don't choose Linkerd if advanced L7 routing is critical
- ❌ Don't migrate between meshes unless absolutely necessary

**Success Criteria:**

- ✅ mTLS working across all services
- ✅ Authorization policies enforced
- ✅ Observability dashboards operational
- ✅ Canary deployments functioning
- ✅ Team can troubleshoot common issues

```

### references/istio-patterns.md

```markdown
# Istio Configuration Patterns

## Table of Contents

- [VirtualService Patterns](#virtualservice-patterns)
- [DestinationRule Patterns](#destinationrule-patterns)
- [Gateway Patterns](#gateway-patterns)
- [ServiceEntry Patterns](#serviceentry-patterns)
- [Combined Routing Examples](#combined-routing-examples)

## VirtualService Patterns

VirtualService defines routing rules for traffic within the mesh.

### Path-Based Routing

Route traffic based on URL path.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: api-versioning
  namespace: production
spec:
  hosts:
  - api.example.com
  http:
  - match:
    - uri:
        prefix: /v2/
    route:
    - destination:
        host: api-v2.production.svc.cluster.local
        port:
          number: 8080
  - match:
    - uri:
        prefix: /v1/
    route:
    - destination:
        host: api-v1.production.svc.cluster.local
        port:
          number: 8080
  - route:
    - destination:
        host: api-v1.production.svc.cluster.local
        port:
          number: 8080
```

### Header-Based Routing

Route based on HTTP headers (user-agent, custom headers, cookies).

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: header-routing
spec:
  hosts:
  - backend
  http:
  # Route mobile users to mobile-optimized backend
  - match:
    - headers:
        user-agent:
          regex: ".*(Mobile|Android|iPhone).*"
    route:
    - destination:
        host: backend
        subset: mobile
  # Route beta testers to canary
  - match:
    - headers:
        x-beta-user:
          exact: "true"
    route:
    - destination:
        host: backend
        subset: canary
  # Default route
  - route:
    - destination:
        host: backend
        subset: stable
```

### Weight-Based Traffic Splitting

Distribute traffic across multiple versions.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: canary-rollout
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 75
    - destination:
        host: reviews
        subset: v2
      weight: 25
```

### Fault Injection

Inject faults for chaos engineering and testing.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: fault-injection
spec:
  hosts:
  - backend
  http:
  - match:
    - headers:
        x-test-fault:
          exact: "true"
    fault:
      delay:
        percentage:
          value: 10.0
        fixedDelay: 5s
      abort:
        percentage:
          value: 5.0
        httpStatus: 503
    route:
    - destination:
        host: backend
  - route:
    - destination:
        host: backend
```

### Traffic Mirroring

Mirror traffic to a shadow deployment for testing.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: traffic-mirror
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
        subset: v1
      weight: 100
    mirror:
      host: backend
      subset: v2-shadow
    mirrorPercentage:
      value: 10.0
```

### Retries and Timeouts

Configure resilience policies.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: resilient-routing
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
    timeout: 3s
    retries:
      attempts: 3
      perTryTimeout: 1s
      retryOn: 5xx,reset,connect-failure,refused-stream
```

### URL Rewrite

Rewrite URLs before routing.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: url-rewrite
spec:
  hosts:
  - api.example.com
  http:
  - match:
    - uri:
        prefix: /legacy/
    rewrite:
      uri: /api/v1/
    route:
    - destination:
        host: backend
```

### CORS Configuration

Configure Cross-Origin Resource Sharing.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: cors-enabled
spec:
  hosts:
  - api.example.com
  http:
  - corsPolicy:
      allowOrigins:
      - exact: https://example.com
      - prefix: https://*.example.com
      allowMethods:
      - GET
      - POST
      - PUT
      - DELETE
      allowHeaders:
      - Authorization
      - Content-Type
      maxAge: "24h"
      allowCredentials: true
    route:
    - destination:
        host: backend
```

## DestinationRule Patterns

DestinationRule configures traffic policies for destinations.

### Circuit Breaker

Fail fast when backend is unhealthy.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: circuit-breaker
spec:
  host: backend.production.svc.cluster.local
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
        http2MaxRequests: 100
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
      minHealthPercent: 20
```

### Load Balancing Algorithms

Configure client-side load balancing.

**Round Robin:**

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: round-robin-lb
spec:
  host: backend
  trafficPolicy:
    loadBalancer:
      simple: ROUND_ROBIN
```

**Consistent Hash (Sticky Sessions):**

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: sticky-sessions
spec:
  host: backend
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpHeaderName: x-user-id
```

**Least Connections:**

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: least-conn-lb
spec:
  host: backend
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN
```

### Subset Configuration

Define subsets based on labels.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend-subsets
spec:
  host: backend
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: canary
    labels:
      version: v2
      track: canary
```

### TLS Configuration

Configure client-side TLS for upstream connections.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: tls-origination
spec:
  host: external-service.example.com
  trafficPolicy:
    tls:
      mode: SIMPLE
      sni: external-service.example.com
```

**Mutual TLS to External Service:**

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: mtls-origination
spec:
  host: secure-external.example.com
  trafficPolicy:
    tls:
      mode: MUTUAL
      clientCertificate: /etc/certs/client-cert.pem
      privateKey: /etc/certs/client-key.pem
      caCertificates: /etc/certs/ca-cert.pem
```

### Connection Pool Settings

Control connection pooling behavior.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: connection-pool
spec:
  host: backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
        connectTimeout: 30ms
        tcpKeepalive:
          time: 7200s
          interval: 75s
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 10
        http2MaxRequests: 100
        maxRequestsPerConnection: 2
        idleTimeout: 3600s
```

## Gateway Patterns

Gateway manages ingress and egress traffic.

### HTTPS Ingress Gateway

Terminate TLS at gateway.

```yaml
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: https-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - api.example.com
    - web.example.com
    tls:
      mode: SIMPLE
      credentialName: example-com-cert
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: gateway-routing
spec:
  hosts:
  - api.example.com
  gateways:
  - istio-system/https-gateway
  http:
  - match:
    - uri:
        prefix: /api/
    route:
    - destination:
        host: backend.production.svc.cluster.local
        port:
          number: 8080
```

### HTTP to HTTPS Redirect

Redirect all HTTP traffic to HTTPS.

```yaml
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: http-redirect
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - api.example.com
    tls:
      httpsRedirect: true
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - api.example.com
    tls:
      mode: SIMPLE
      credentialName: api-cert
```

### Mutual TLS at Gateway

Require client certificates at ingress.

```yaml
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: mtls-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 443
      name: https-mtls
      protocol: HTTPS
    hosts:
    - secure.example.com
    tls:
      mode: MUTUAL
      credentialName: secure-example-com-cert
      caCertificates: /etc/istio/ca-certificates/ca-cert.pem
```

### Egress Gateway

Route external traffic through egress gateway.

```yaml
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: egress-gateway
  namespace: istio-system
spec:
  selector:
    istio: egressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - api.external.com
    tls:
      mode: PASSTHROUGH
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: egress-routing
spec:
  hosts:
  - api.external.com
  gateways:
  - mesh
  - istio-system/egress-gateway
  http:
  - match:
    - gateways:
      - mesh
      port: 80
    route:
    - destination:
        host: istio-egressgateway.istio-system.svc.cluster.local
        port:
          number: 443
  - match:
    - gateways:
      - istio-system/egress-gateway
      port: 443
    route:
    - destination:
        host: api.external.com
        port:
          number: 443
```

## ServiceEntry Patterns

ServiceEntry adds external services to mesh.

### External HTTPS Service

Add external API to service registry.

```yaml
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: external-api
spec:
  hosts:
  - api.github.com
  ports:
  - number: 443
    name: https
    protocol: HTTPS
  location: MESH_EXTERNAL
  resolution: DNS
```

### External Database

Add external database with static IPs.

```yaml
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: external-postgres
spec:
  hosts:
  - postgres.example.com
  addresses:
  - 10.20.30.40
  ports:
  - number: 5432
    name: postgres
    protocol: TCP
  location: MESH_EXTERNAL
  resolution: STATIC
  endpoints:
  - address: 10.20.30.40
    ports:
      postgres: 5432
```

### Mesh Expansion (VM Integration)

Add VMs to service mesh.

```yaml
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: vm-service
spec:
  hosts:
  - vm-backend.example.com
  ports:
  - number: 8080
    name: http
    protocol: HTTP
  location: MESH_INTERNAL
  resolution: STATIC
  endpoints:
  - address: 192.168.1.100
    ports:
      http: 8080
    labels:
      app: backend
      version: v1
```

## Combined Routing Examples

### Canary Deployment with Monitoring

Canary deployment with header-based routing and traffic split.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend-subsets
spec:
  host: backend
  subsets:
  - name: stable
    labels:
      version: v1
  - name: canary
    labels:
      version: v2
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-canary
spec:
  hosts:
  - backend
  http:
  # Internal testers always see canary
  - match:
    - headers:
        x-canary-user:
          exact: "true"
    route:
    - destination:
        host: backend
        subset: canary
  # 10% of production traffic to canary
  - route:
    - destination:
        host: backend
        subset: stable
      weight: 90
    - destination:
        host: backend
        subset: canary
      weight: 10
```

### Multi-Region Routing with Failover

Route to nearest region with failover.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: multi-region
spec:
  host: backend
  trafficPolicy:
    loadBalancer:
      localityLbSetting:
        enabled: true
        distribute:
        - from: us-east/*
          to:
            "us-east/*": 80
            "us-west/*": 20
        - from: us-west/*
          to:
            "us-west/*": 80
            "us-east/*": 20
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
```

### A/B Testing

Route traffic based on user segments.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: ab-test
spec:
  hosts:
  - frontend
  http:
  # Variant A: Control group
  - match:
    - headers:
        cookie:
          regex: "^(.*?;)?(ab-test=a)(;.*)?$"
    route:
    - destination:
        host: frontend
        subset: variant-a
  # Variant B: Treatment group
  - match:
    - headers:
        cookie:
          regex: "^(.*?;)?(ab-test=b)(;.*)?$"
    route:
    - destination:
        host: frontend
        subset: variant-b
  # Default: 50/50 split with cookie injection
  - route:
    - destination:
        host: frontend
        subset: variant-a
      weight: 50
    - destination:
        host: frontend
        subset: variant-b
      weight: 50
```

### Blue/Green Deployment

Instant cutover between versions.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: blue-green
spec:
  hosts:
  - backend
  http:
  # Route all traffic to green (or blue for rollback)
  - route:
    - destination:
        host: backend
        subset: green
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend-versions
spec:
  host: backend
  subsets:
  - name: blue
    labels:
      version: blue
  - name: green
    labels:
      version: green
```

## Best Practices

**VirtualService:**
- Use specific host matches (avoid wildcards in production)
- Order match conditions from most to least specific
- Always include a default route (no match conditions)
- Set reasonable timeouts (avoid infinite waits)
- Use retries judiciously (avoid retry storms)

**DestinationRule:**
- Configure circuit breakers for external dependencies
- Use outlier detection to remove unhealthy endpoints
- Set connection pool limits to prevent resource exhaustion
- Choose load balancing algorithm based on workload characteristics
- Define subsets for all versions in use

**Gateway:**
- Use TLS for all ingress traffic (HTTPS only)
- Store certificates in Kubernetes secrets
- Configure HTTP to HTTPS redirects
- Use separate gateways for different security zones
- Rotate certificates before expiration

**ServiceEntry:**
- Use DNS resolution for cloud services
- Use STATIC resolution with specific IPs for legacy systems
- Set appropriate protocols (HTTP, HTTPS, TCP, gRPC)
- Configure timeouts and retries for external services
- Monitor external service health

```

### references/linkerd-patterns.md

```markdown
# Linkerd Configuration Patterns

## Table of Contents

- [HTTPRoute Patterns](#httproute-patterns)
- [ServiceProfile Patterns](#serviceprofile-patterns)
- [Server and Policy Patterns](#server-and-policy-patterns)
- [Authorization Patterns](#authorization-patterns)
- [Observability Integration](#observability-integration)

## HTTPRoute Patterns

HTTPRoute uses Gateway API standard for traffic management.

### Basic Traffic Splitting

Canary deployment with weight-based routing.

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-canary
  namespace: production
spec:
  parentRefs:
  - name: backend
    kind: Service
    group: core
    port: 8080
  rules:
  - backendRefs:
    - name: backend-v1
      port: 8080
      weight: 90
    - name: backend-v2
      port: 8080
      weight: 10
```

### Path-Based Routing

Route based on URL path prefix.

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: api-versioning
spec:
  parentRefs:
  - name: api-gateway
    kind: Service
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1
    backendRefs:
    - name: api-v1
      port: 8080
  - matches:
    - path:
        type: PathPrefix
        value: /v2
    backendRefs:
    - name: api-v2
      port: 8080
  - backendRefs:
    - name: api-v1
      port: 8080
```

### Header-Based Routing

Route based on HTTP headers.

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: header-routing
spec:
  parentRefs:
  - name: backend
  rules:
  - matches:
    - headers:
      - name: x-canary-user
        value: "true"
    backendRefs:
    - name: backend-canary
      port: 8080
  - matches:
    - headers:
      - name: user-agent
        value: "mobile"
        type: Exact
    backendRefs:
    - name: backend-mobile
      port: 8080
  - backendRefs:
    - name: backend-stable
      port: 8080
```

### Request Header Modification

Add, set, or remove headers.

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: header-modification
spec:
  parentRefs:
  - name: backend
  rules:
  - filters:
    - type: RequestHeaderModifier
      requestHeaderModifier:
        set:
        - name: x-forwarded-by
          value: linkerd-mesh
        add:
        - name: x-custom-header
          value: custom-value
        remove:
        - x-internal-header
    backendRefs:
    - name: backend
      port: 8080
```

### Cross-Namespace Routing

Route to services in different namespaces.

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: cross-namespace
  namespace: frontend
spec:
  parentRefs:
  - name: frontend-service
    namespace: frontend
  rules:
  - backendRefs:
    - name: backend-service
      namespace: backend
      port: 8080
```

## ServiceProfile Patterns

ServiceProfile configures per-route metrics, retries, and timeouts.

### Basic ServiceProfile

Define routes for better observability.

```yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: backend.production.svc.cluster.local
  namespace: production
spec:
  routes:
  - name: GET /api/users
    condition:
      method: GET
      pathRegex: /api/users
  - name: POST /api/users
    condition:
      method: POST
      pathRegex: /api/users
  - name: GET /api/users/[id]
    condition:
      method: GET
      pathRegex: /api/users/[^/]+
```

### Retries Configuration

Configure automatic retries for idempotent requests.

```yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: backend.production.svc.cluster.local
  namespace: production
spec:
  routes:
  - name: GET /api/data
    condition:
      method: GET
      pathRegex: /api/data
    timeout: 3s
    retryBudget:
      retryRatio: 0.2
      minRetriesPerSecond: 10
      ttl: 10s
  - name: POST /api/data
    condition:
      method: POST
      pathRegex: /api/data
    timeout: 5s
    isRetryable: false
```

### Timeout Configuration

Set per-route timeout values.

```yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: slow-service.production.svc.cluster.local
spec:
  routes:
  - name: GET /slow-operation
    condition:
      method: GET
      pathRegex: /slow-operation
    timeout: 30s
  - name: GET /fast-operation
    condition:
      method: GET
      pathRegex: /fast-operation
    timeout: 1s
```

### Response Class Configuration

Define custom success criteria.

```yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: backend.production.svc.cluster.local
spec:
  routes:
  - name: GET /api/users
    condition:
      method: GET
      pathRegex: /api/users
    responseClasses:
    - condition:
        status:
          min: 200
          max: 299
      isFailure: false
    - condition:
        status:
          min: 500
          max: 599
      isFailure: true
```

### Auto-Generated ServiceProfiles

Generate ServiceProfile from live traffic.

```bash
# Generate from OpenAPI spec
linkerd profile --open-api swagger.json backend

# Generate from Protobuf
linkerd profile --proto api.proto backend

# Generate from live traffic observation
linkerd profile -n production backend --tap deploy/backend --tap-duration 60s
```

## Server and Policy Patterns

Server resource defines policy attachment points.

### Basic Server Definition

Define a server for policy targeting.

```yaml
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
  name: backend-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  port: 8080
  proxyProtocol: HTTP/2
```

### Multiple Ports

Define servers for different ports.

```yaml
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
  name: backend-http
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  port: 8080
  proxyProtocol: HTTP/1
---
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
  name: backend-grpc
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  port: 9090
  proxyProtocol: gRPC
```

### Server with HTTP Route

Combine Server with HTTPRoute for granular control.

```yaml
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
  name: api-server
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  port: 8080
---
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: api-routes
  namespace: production
spec:
  parentRefs:
  - name: api-server
    kind: Server
    group: policy.linkerd.io
  rules:
  - matches:
    - path:
        value: /admin
    backendRefs:
    - name: admin-backend
      port: 8080
  - backendRefs:
    - name: public-backend
      port: 8080
```

## Authorization Patterns

Linkerd authorization uses identity-based access control.

### Allow Specific Service

Allow one service to call another.

```yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend
  namespace: production
spec:
  targetRef:
    group: policy.linkerd.io
    kind: Server
    name: backend-api
  requiredAuthenticationRefs:
  - name: frontend-identity
    kind: MeshTLSAuthentication
---
apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata:
  name: frontend-identity
  namespace: production
spec:
  identities:
  - "frontend.production.serviceaccount.identity.linkerd.cluster.local"
```

### Allow Multiple Services

Allow multiple services with one policy.

```yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata:
  name: allowed-clients
  namespace: production
spec:
  identities:
  - "frontend.production.serviceaccount.identity.linkerd.cluster.local"
  - "gateway.production.serviceaccount.identity.linkerd.local"
  - "*.staging.serviceaccount.identity.linkerd.cluster.local"
---
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: allow-clients
  namespace: production
spec:
  targetRef:
    kind: Server
    name: backend-api
  requiredAuthenticationRefs:
  - name: allowed-clients
    kind: MeshTLSAuthentication
```

### Per-Route Authorization

Apply different policies to different routes.

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: admin-routes
  namespace: production
spec:
  parentRefs:
  - name: api-server
    kind: Server
    group: policy.linkerd.io
  rules:
  - matches:
    - path:
        value: /admin
    backendRefs:
    - name: admin-backend
      port: 8080
---
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: admin-only
  namespace: production
spec:
  targetRef:
    kind: HTTPRoute
    name: admin-routes
  requiredAuthenticationRefs:
  - name: admin-identity
    kind: MeshTLSAuthentication
---
apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata:
  name: admin-identity
  namespace: production
spec:
  identities:
  - "admin-gateway.production.serviceaccount.identity.linkerd.cluster.local"
```

### Network-Based Authentication

Allow traffic from specific networks (use sparingly, prefer identity).

```yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: NetworkAuthentication
metadata:
  name: internal-network
  namespace: production
spec:
  networks:
  - cidr: 10.0.0.0/8
  - cidr: 192.168.0.0/16
---
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: allow-internal
  namespace: production
spec:
  targetRef:
    kind: Server
    name: backend-api
  requiredAuthenticationRefs:
  - name: internal-network
    kind: NetworkAuthentication
```

### Default Deny Policy

Deny all traffic by default (zero-trust).

```yaml
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
  name: backend-locked
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  port: 8080
  # No authorization policies = default deny
```

## Observability Integration

### mTLS Verification

Check mTLS status between services.

```bash
# View service connections and mTLS status
linkerd edges deployment/frontend -n production

# Output shows:
# SRC         DST           SECURED       MSG/SEC
# frontend    backend       √             10.2
```

### Live Traffic Observation

Tap live traffic for debugging.

```bash
# Tap all traffic to backend
linkerd tap deployment/backend -n production

# Tap specific route
linkerd tap deployment/backend -n production --path /api/users

# Tap with filtering
linkerd tap deployment/backend -n production \
  --method GET \
  --path /api/users \
  --authority backend.production.svc.cluster.local
```

### Service Statistics

View service metrics.

```bash
# Per-service metrics
linkerd stat deployment/backend -n production

# Per-route metrics (requires ServiceProfile)
linkerd routes deployment/backend -n production

# Output shows success rate, RPS, latencies (P50, P95, P99)
```

### Dashboard Access

Access Linkerd dashboard.

```bash
# Launch dashboard
linkerd dashboard

# Access specific namespace
linkerd dashboard -n production

# View service graph
linkerd viz dashboard
```

### Prometheus Integration

Linkerd exports metrics to Prometheus automatically.

**Useful Queries:**

```promql
# Request rate
sum(rate(request_total[1m])) by (dst_service)

# Success rate
sum(rate(request_total{classification="success"}[1m])) by (dst_service)
/ sum(rate(request_total[1m])) by (dst_service)

# P95 latency
histogram_quantile(0.95,
  sum(rate(response_latency_ms_bucket[1m])) by (le, dst_service)
)
```

## Multi-Cluster Patterns

### Multi-Cluster Setup

Link multiple clusters.

```bash
# Install Linkerd on cluster 1
linkerd install --cluster-domain cluster1.local | kubectl apply -f -

# Install Linkerd on cluster 2
linkerd install --cluster-domain cluster2.local | kubectl apply -f -

# Link clusters (from cluster 1)
linkerd multicluster link --cluster-name cluster2 | \
  kubectl --context=cluster1 apply -f -

# Export service from cluster 2
kubectl --context=cluster2 label svc/backend \
  mirror.linkerd.io/exported=true -n production
```

### Cross-Cluster Traffic Routing

Route traffic to services in remote clusters.

```yaml
# Service is automatically mirrored with suffix
# backend-cluster2.production.svc.cluster1.local

apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: multi-cluster-routing
  namespace: production
spec:
  parentRefs:
  - name: frontend
  rules:
  # 80% local, 20% remote
  - backendRefs:
    - name: backend
      port: 8080
      weight: 80
    - name: backend-cluster2
      port: 8080
      weight: 20
```

## Best Practices

**HTTPRoute:**
- Use Gateway API standard resources (future-proof)
- Leverage header modification for observability
- Keep route matching simple and explicit
- Use cross-namespace routing sparingly

**ServiceProfile:**
- Auto-generate from OpenAPI/Protobuf when possible
- Set timeouts based on actual performance
- Use retry budgets to prevent retry storms
- Mark mutations (POST, PUT, DELETE) as non-retryable

**Authorization:**
- Prefer identity-based over network-based policies
- Start with default-deny, add explicit allows
- Use Server resources for fine-grained control
- Apply policies at route level for sensitive operations

**Observability:**
- Use tap for real-time debugging (not production monitoring)
- Create ServiceProfiles for per-route metrics
- Integrate with Prometheus for alerting
- Monitor edges for mTLS status

**Multi-Cluster:**
- Use consistent naming across clusters
- Monitor cross-cluster latency
- Implement circuit breakers for remote calls
- Test failover scenarios regularly

```

### references/cilium-patterns.md

```markdown
# Cilium eBPF Service Mesh Patterns

## Table of Contents

- [CiliumNetworkPolicy Patterns](#ciliumnetworkpolicy-patterns)
- [L7 HTTP Policies](#l7-http-policies)
- [DNS-Based Policies](#dns-based-policies)
- [mTLS with SPIRE](#mtls-with-spire)
- [Observability with Hubble](#observability-with-hubble)

## CiliumNetworkPolicy Patterns

Cilium enforces network policies at kernel level using eBPF.

### L3/L4 Policy (Basic)

Allow specific pods to communicate on specific ports.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  description: "Allow frontend pods to access backend on port 8080"
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
```

### Namespace-Level Policy

Allow all pods in one namespace to access another.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-from-frontend-namespace
  namespace: backend
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
```

### Egress Policy

Control outbound traffic from pods.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: backend-egress
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  # Allow to database
  - toEndpoints:
    - matchLabels:
        app: postgres
    toPorts:
    - ports:
      - port: "5432"
        protocol: TCP
  # Allow to external API (requires DNS, see below)
  - toFQDNs:
    - matchName: "api.external.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
  # Allow DNS queries
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: "*"
```

### Deny-All Policy

Default deny for zero-trust security.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - {}  # Empty rule denies all ingress
```

### Label-Based Selection

Select endpoints using Kubernetes labels.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: label-based-policy
spec:
  endpointSelector:
    matchLabels:
      app: backend
      env: production
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
        env: production
    - matchLabels:
        app: gateway
        tier: public
    toPorts:
    - ports:
      - port: "8080"
```

## L7 HTTP Policies

Cilium supports L7 (HTTP) policy enforcement.

### HTTP Method and Path

Allow specific HTTP methods and paths.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: http-api-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: GET
          path: "/api/users"
        - method: GET
          path: "/api/users/.*"
        - method: POST
          path: "/api/users"
```

### HTTP Header Matching

Match on HTTP headers.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: http-header-policy
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
      rules:
        http:
        - method: GET
          path: "/api/admin/.*"
          headers:
          - "X-Admin-Token: secret-value"
```

### HTTP Host Header

Route based on Host header.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: http-host-policy
spec:
  endpointSelector:
    matchLabels:
      app: ingress
  ingress:
  - fromEndpoints:
    - matchLabels:
        reserved:world
    toPorts:
    - ports:
      - port: "80"
      rules:
        http:
        - method: GET
          host: "api.example.com"
          path: "/.*"
```

### gRPC Policy

Control gRPC services and methods.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: grpc-policy
spec:
  endpointSelector:
    matchLabels:
      app: grpc-backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: grpc-client
    toPorts:
    - ports:
      - port: "9090"
      rules:
        http:
        - method: POST
          path: "/user.UserService/GetUser"
        - method: POST
          path: "/user.UserService/ListUsers"
```

## DNS-Based Policies

Control egress to external services by DNS name.

### Basic FQDN Matching

Allow egress to specific domain.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-github-api
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  - toFQDNs:
    - matchName: "api.github.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
  # Allow DNS
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: "*"
```

### Wildcard FQDN

Allow egress to domain and subdomains.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-aws-services
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  - toFQDNs:
    - matchPattern: "*.amazonaws.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
```

### Multiple FQDNs

Allow multiple external services.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-external-apis
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  - toFQDNs:
    - matchName: "api.github.com"
    - matchName: "api.stripe.com"
    - matchPattern: "*.slack.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
  # Allow DNS
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: "*"
```

### DNS Policy with TTL

Configure DNS TTL for policy updates.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: fqdn-with-ttl
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  - toFQDNs:
    - matchName: "dynamic.example.com"
    toPorts:
    - ports:
      - port: "443"
  egressDeny:
  - toFQDNs:
    - matchPattern: "malicious.*"
```

## mTLS with SPIRE

Cilium integrates with SPIRE for mutual TLS.

### Enable mTLS (Helm Values)

Install Cilium with SPIRE authentication.

```yaml
# values.yaml for Cilium Helm chart
authentication:
  mutual:
    spire:
      enabled: true
      install:
        enabled: true
        server:
          dataStorage:
            size: 1Gi
        agent:
          image:
            tag: 1.8.5
```

### Install Command

```bash
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set authentication.mutual.spire.enabled=true \
  --set authentication.mutual.spire.install.enabled=true
```

### mTLS Required Policy

Require mTLS for specific traffic.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: mtls-required
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    authentication:
      mode: required  # Require mTLS
    toPorts:
    - ports:
      - port: "8080"
```

### Service Identity Verification

Verify SPIFFE identity in policy.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: identity-based-mtls
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        k8s:io.cilium.k8s.policy.serviceaccount: frontend-sa
    authentication:
      mode: required
    toPorts:
    - ports:
      - port: "8080"
```

### Check mTLS Status

```bash
# List authenticated connections
cilium bpf auth list

# Check specific endpoint
cilium endpoint list
cilium endpoint get <endpoint-id>
```

## Cluster-Wide Policies

Apply policies across all namespaces.

### CiliumClusterwideNetworkPolicy

Global default-deny policy.

```yaml
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: default-deny-all
spec:
  description: "Deny all traffic by default"
  endpointSelector: {}
  ingress:
  - {}
  egress:
  # Allow DNS for all pods
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY
```

### Allow Cluster Communication

Allow essential cluster traffic.

```yaml
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: allow-cluster-essentials
spec:
  endpointSelector: {}
  egress:
  # Allow to Kubernetes API server
  - toEntities:
    - kube-apiserver
  # Allow DNS
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toPorts:
    - ports:
      - port: "53"
```

## Observability with Hubble

Hubble provides eBPF-based observability.

### Enable Hubble

```bash
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true
```

### Observe Traffic

```bash
# Watch all traffic in namespace
hubble observe --namespace production

# Filter by specific pod
hubble observe --pod backend-7d8f9c5b4-xyz12

# Filter by verdict (dropped, forwarded)
hubble observe --verdict DROPPED

# Filter by protocol
hubble observe --protocol http

# Show specific ports
hubble observe --port 8080

# Show DNS queries
hubble observe --type l7 --protocol dns
```

### Hubble Metrics

Export metrics to Prometheus.

```yaml
# Enable Hubble metrics
hubble:
  metrics:
    enabled:
    - dns
    - drop
    - tcp
    - flow
    - icmp
    - http
```

**Useful Metrics:**

```promql
# HTTP request rate
rate(hubble_http_requests_total[5m])

# DNS query rate
rate(hubble_dns_queries_total[5m])

# Dropped packets
rate(hubble_drop_total[5m])

# TCP flags
rate(hubble_tcp_flags_total[5m])
```

### Hubble UI

Access visual service map.

```bash
# Port forward Hubble UI
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

# Open browser
open http://localhost:12000
```

### Flow Filtering

Advanced flow filtering.

```bash
# Show flows between specific services
hubble observe --from-pod frontend --to-pod backend

# Show HTTP 500 errors
hubble observe --type l7 --http-status 500

# Show specific HTTP methods
hubble observe --type l7 --http-method POST

# Show specific paths
hubble observe --type l7 --http-path "/api/users"

# Export to JSON
hubble observe -o json --last 100 > flows.json
```

## CiliumEnvoyConfig (Advanced L7)

Use Envoy for advanced L7 routing.

### Basic Envoy Configuration

Enable Envoy for specific service.

```yaml
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: backend-envoy
  namespace: production
spec:
  services:
  - name: backend
    namespace: production
  resources:
  - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
    name: backend-listener
    filterChains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typedConfig:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          statPrefix: backend
          routeConfig:
            name: backend_route
            virtualHosts:
            - name: backend
              domains: ["*"]
              routes:
              - match:
                  prefix: "/v1"
                route:
                  cluster: backend-v1
              - match:
                  prefix: "/v2"
                route:
                  cluster: backend-v2
```

## Multi-Cluster with Cilium

Cluster Mesh enables multi-cluster connectivity.

### Enable Cluster Mesh

```bash
# Install Cilium on cluster 1
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set cluster.name=cluster1 \
  --set cluster.id=1

# Install Cilium on cluster 2
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set cluster.name=cluster2 \
  --set cluster.id=2

# Enable cluster mesh
cilium clustermesh enable

# Connect clusters
cilium clustermesh connect --context cluster1 --destination-context cluster2
```

### Cross-Cluster Policy

Allow traffic between clusters.

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: cross-cluster-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
        k8s:io.cilium.k8s.policy.cluster: cluster1
    - matchLabels:
        app: frontend
        k8s:io.cilium.k8s.policy.cluster: cluster2
```

## Best Practices

**Policy Design:**
- Start with cluster-wide default-deny
- Use explicit allow rules for required traffic
- Prefer identity-based policies over IP-based
- Test policies in audit mode before enforcing
- Use labels consistently across applications

**L7 Policies:**
- Apply L7 rules only where necessary (performance impact)
- Combine with L3/L4 rules for defense in depth
- Use HTTP method and path restrictions
- Monitor L7 policy performance with Hubble

**DNS Policies:**
- Always allow DNS to kube-dns
- Use specific FQDNs over wildcards when possible
- Monitor DNS queries for anomalies
- Consider DNS TTL impact on policy updates

**mTLS:**
- Enable SPIRE for production workloads
- Require mTLS for sensitive services
- Monitor authentication failures
- Rotate SPIRE credentials regularly

**Observability:**
- Enable Hubble for all clusters
- Export metrics to Prometheus
- Use Hubble UI for visual debugging
- Set up alerts for dropped packets and policy violations

**Multi-Cluster:**
- Use consistent cluster IDs
- Test failover scenarios
- Monitor cross-cluster latency
- Implement circuit breakers for remote calls

```

### references/security-patterns.md

```markdown
# Service Mesh Security Patterns

## Table of Contents

- [Zero-Trust Architecture](#zero-trust-architecture)
- [Mutual TLS Configuration](#mutual-tls-configuration)
- [Authorization Policies](#authorization-policies)
- [JWT Authentication](#jwt-authentication)
- [External Authorization](#external-authorization)
- [Certificate Management](#certificate-management)

## Zero-Trust Architecture

Implement security principle: never trust, always verify.

### Core Principles

1. **Default Deny:** Block all traffic unless explicitly allowed
2. **Identity-Based:** Use workload identities, not IP addresses
3. **Least Privilege:** Grant minimum required permissions
4. **Micro-Segmentation:** Isolate services at network level
5. **Continuous Verification:** Authenticate every request

### Implementation Steps (Istio)

**Step 1: Enable Strict mTLS**

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default-strict-mtls
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
```

**Step 2: Default Deny All Traffic**

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec: {}
```

**Step 3: Explicit Allow Rules**

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
```

**Step 4: Namespace Isolation**

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-cross-namespace
  namespace: production
spec:
  action: DENY
  rules:
  - from:
    - source:
        notNamespaces: ["production"]
```

### Zero-Trust Checklist

- [ ] Strict mTLS enabled mesh-wide
- [ ] Default-deny authorization policies
- [ ] Explicit allow rules for all required communication
- [ ] Service accounts properly configured
- [ ] Namespace isolation enforced
- [ ] Audit logging enabled
- [ ] Regular certificate rotation
- [ ] Policy validation in CI/CD

## Mutual TLS Configuration

Automatic encryption of service-to-service traffic.

### Istio mTLS Modes

**STRICT Mode (Production):**

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: strict-mtls
  namespace: production
spec:
  mtls:
    mode: STRICT
```

Rejects all plaintext connections. Use in production after migration.

**PERMISSIVE Mode (Migration):**

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: permissive-mtls
  namespace: production
spec:
  mtls:
    mode: PERMISSIVE
```

Accepts both mTLS and plaintext. Use during migration.

**DISABLE Mode (Legacy):**

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: disable-mtls
  namespace: production
spec:
  selector:
    matchLabels:
      app: legacy-service
  mtls:
    mode: DISABLE
```

Disables mTLS for specific workloads. Use sparingly.

### Per-Port mTLS

Configure mTLS for specific ports.

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: per-port-mtls
spec:
  selector:
    matchLabels:
      app: backend
  mtls:
    mode: STRICT
  portLevelMtls:
    8080:
      mode: PERMISSIVE
    9090:
      mode: DISABLE
```

### Verify mTLS Status

**Istio:**

```bash
# Check mTLS status for deployment
istioctl authn tls-check frontend.production.svc.cluster.local

# Verify proxy configuration
istioctl proxy-config secret deployment/frontend -n production
```

**Linkerd:**

```bash
# Check mTLS edges
linkerd edges deployment/frontend -n production

# Verify identity
linkerd identity list -n production
```

**Cilium:**

```bash
# List authenticated connections
cilium bpf auth list

# Check SPIRE status
kubectl exec -n spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry show
```

### Migration Strategy

**Phase 1: Install Mesh with PERMISSIVE**

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: PERMISSIVE
```

**Phase 2: Monitor mTLS Adoption**

```bash
# Check which services are using mTLS
kubectl get peerauthentication -A
istioctl authn tls-check <service>
```

**Phase 3: Switch to STRICT**

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
```

**Phase 4: Validate**

```bash
# Ensure all traffic is encrypted
istioctl analyze
kubectl logs -n production deployment/frontend -c istio-proxy
```

## Authorization Policies

Control access to services based on identity and attributes.

### Service-to-Service Authorization

Allow specific service to call another.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: frontend-to-backend
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend
```

### HTTP Method and Path Restrictions

Allow only specific operations.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: read-only-api
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend
    to:
    - operation:
        methods: ["GET", "HEAD"]
        paths: ["/api/users/*", "/api/products/*"]
```

### IP-Based Restrictions (Use Sparingly)

Allow traffic from specific IPs.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: ip-allowlist
spec:
  selector:
    matchLabels:
      app: admin-api
  action: ALLOW
  rules:
  - from:
    - source:
        ipBlocks:
        - 10.0.0.0/8
        - 192.168.1.100/32
```

**Note:** Prefer identity-based over IP-based for cloud environments.

### Time-Based Access Control

Restrict access during specific hours (requires custom attributes).

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: business-hours-only
spec:
  selector:
    matchLabels:
      app: reporting-service
  action: ALLOW
  rules:
  - when:
    - key: request.time
      values: ["09:00-17:00"]
```

### Deny Policies

Explicitly deny dangerous operations.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-delete
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: DENY
  rules:
  - to:
    - operation:
        methods: ["DELETE"]
```

### Namespace Isolation

Prevent cross-namespace communication.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: namespace-isolation
  namespace: production
spec:
  action: DENY
  rules:
  - from:
    - source:
        notNamespaces: ["production", "istio-system"]
```

## JWT Authentication

Validate JSON Web Tokens for API authentication.

### RequestAuthentication

Define JWT validation rules.

```yaml
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  jwtRules:
  - issuer: "https://auth.example.com"
    jwksUri: "https://auth.example.com/.well-known/jwks.json"
    audiences:
    - "api.example.com"
    - "mobile.example.com"
    forwardOriginalToken: true
```

### Require Valid JWT

Enforce JWT presence with AuthorizationPolicy.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: ALLOW
  rules:
  - from:
    - source:
        requestPrincipals: ["*"]
```

### Claims-Based Authorization

Authorize based on JWT claims.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: admin-only
  namespace: production
spec:
  selector:
    matchLabels:
      app: admin-api
  action: ALLOW
  rules:
  - from:
    - source:
        requestPrincipals: ["*"]
    when:
    - key: request.auth.claims[role]
      values: ["admin", "superuser"]
    - key: request.auth.claims[verified]
      values: ["true"]
```

### Per-Path JWT Requirements

Different paths, different auth requirements.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: public-private-paths
spec:
  selector:
    matchLabels:
      app: api
  action: ALLOW
  rules:
  # Public endpoints: no auth required
  - to:
    - operation:
        paths: ["/health", "/metrics", "/public/*"]
  # Private endpoints: JWT required
  - from:
    - source:
        requestPrincipals: ["*"]
    to:
    - operation:
        paths: ["/api/*"]
```

### Multiple JWT Issuers

Support tokens from multiple identity providers.

```yaml
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: multi-issuer-jwt
spec:
  selector:
    matchLabels:
      app: api-gateway
  jwtRules:
  # Google OAuth
  - issuer: "https://accounts.google.com"
    jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
    audiences: ["api.example.com"]
  # Auth0
  - issuer: "https://example.auth0.com/"
    jwksUri: "https://example.auth0.com/.well-known/jwks.json"
    audiences: ["https://api.example.com"]
  # Custom IDP
  - issuer: "https://auth.example.com"
    jwksUri: "https://auth.example.com/jwks"
    audiences: ["api.example.com"]
```

## External Authorization

Integrate with external authorization services.

### OPA (Open Policy Agent) Integration

**Step 1: Deploy OPA**

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: opa
  namespace: opa-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: opa
  template:
    metadata:
      labels:
        app: opa
    spec:
      containers:
      - name: opa
        image: openpolicyagent/opa:latest
        args:
        - "run"
        - "--server"
        - "--addr=0.0.0.0:9191"
        - "/policies"
        volumeMounts:
        - name: policies
          mountPath: /policies
      volumes:
      - name: policies
        configMap:
          name: opa-policies
```

**Step 2: Configure Istio Extension Provider**

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio
  namespace: istio-system
data:
  mesh: |
    extensionProviders:
    - name: opa
      envoyExtAuthzGrpc:
        service: opa.opa-system.svc.cluster.local
        port: 9191
```

**Step 3: Apply Authorization Policy**

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: ext-authz-opa
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: CUSTOM
  provider:
    name: opa
  rules:
  - to:
    - operation:
        paths: ["/api/*"]
```

**Step 4: Define OPA Policy**

```rego
package istio.authz

import input.attributes.request.http as http_request

default allow = false

# Allow GET requests to /api/users for authenticated users
allow {
    http_request.method == "GET"
    startswith(http_request.path, "/api/users")
    input.attributes.source.principal != ""
}

# Allow admins to access /api/admin
allow {
    startswith(http_request.path, "/api/admin")
    input.attributes.source.principal contains "admin"
}
```

### Custom External Authorizer

Implement custom authorization logic.

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: custom-authz
spec:
  selector:
    matchLabels:
      app: payment-api
  action: CUSTOM
  provider:
    name: custom-authz-service
  rules:
  - to:
    - operation:
        paths: ["/payment/*"]
```

## Certificate Management

Manage TLS certificates for service mesh.

### Automatic Certificate Rotation (Istio)

Configure certificate TTL and rotation.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio
  namespace: istio-system
data:
  mesh: |
    certificates:
    - secretName: istio-ca-secret
      dnsNames:
      - istio-ca
    defaultConfig:
      workloadCertTtl: 24h
    meshConfig:
      trustDomain: cluster.local
```

### External CA Integration (cert-manager)

Use cert-manager for certificate issuance.

**Step 1: Create CA Issuer**

```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: istio-ca
spec:
  ca:
    secretName: istio-ca-secret
```

**Step 2: Configure Istio to Use External CA**

```yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    caCertificates:
    - certSigners:
      - clusterissuers.cert-manager.io/istio-ca
```

### Certificate Monitoring

Monitor certificate expiration.

```bash
# Check certificate details (Istio)
istioctl proxy-config secret deployment/frontend -n production -o json | \
  jq '.dynamicActiveSecrets[] | select(.name=="default") | .secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -text -noout

# Alert on expiration
# Prometheus query:
certmanager_certificate_expiration_timestamp_seconds - time() < 86400
```

### Custom Root CA

Use enterprise PKI root CA.

```bash
# Generate root CA
openssl req -x509 -sha256 -nodes -days 3650 \
  -newkey rsa:4096 -keyout root-ca.key -out root-ca.crt

# Create Kubernetes secret
kubectl create secret generic istio-ca-secret \
  -n istio-system \
  --from-file=ca-cert.pem=root-ca.crt \
  --from-file=ca-key.pem=root-ca.key
```

## Security Best Practices

**mTLS:**
- Use STRICT mode in production
- Monitor mTLS adoption metrics
- Rotate certificates automatically
- Use short certificate lifetimes (24h)

**Authorization:**
- Start with default-deny policies
- Use identity-based controls (not IP-based)
- Apply least privilege principle
- Audit policy changes

**JWT:**
- Validate issuer and audience
- Use short token lifetimes
- Rotate signing keys regularly
- Validate claims in authorization policies

**External Authorization:**
- Use for complex business logic
- Monitor authorizer latency
- Implement fallback policies
- Cache authorization decisions when appropriate

**Certificate Management:**
- Automate rotation
- Monitor expiration
- Use separate CAs per environment
- Implement certificate revocation

```

### references/progressive-delivery.md

```markdown
# Progressive Delivery Patterns

## Table of Contents

- [Canary Deployments](#canary-deployments)
- [Blue/Green Deployments](#bluegreen-deployments)
- [A/B Testing](#ab-testing)
- [Automated Rollback](#automated-rollback)
- [Flagger Integration](#flagger-integration)

## Canary Deployments

Gradually shift traffic to new version while monitoring metrics.

### Manual Canary (Istio)

**Stage 1: Deploy v2 with 0% Traffic**

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-v2
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
      version: v2
  template:
    metadata:
      labels:
        app: backend
        version: v2
    spec:
      containers:
      - name: backend
        image: backend:v2
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend
spec:
  host: backend
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-canary
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
        subset: v1
      weight: 100
    - destination:
        host: backend
        subset: v2
      weight: 0
```

**Stage 2: Route 10% to v2**

```bash
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-canary
  namespace: production
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
        subset: v1
      weight: 90
    - destination:
        host: backend
        subset: v2
      weight: 10
EOF
```

**Stage 3: Progressive Increase**

```bash
# Monitor metrics, then increase
# 10% → 25% → 50% → 75% → 100%

# 25%
kubectl patch vs backend-canary -n production --type merge -p '
{
  "spec": {
    "http": [{
      "route": [
        {"destination": {"host": "backend", "subset": "v1"}, "weight": 75},
        {"destination": {"host": "backend", "subset": "v2"}, "weight": 25}
      ]
    }]
  }
}'

# 50%
kubectl patch vs backend-canary -n production --type merge -p '
{
  "spec": {
    "http": [{
      "route": [
        {"destination": {"host": "backend", "subset": "v1"}, "weight": 50},
        {"destination": {"host": "backend", "subset": "v2"}, "weight": 50}
      ]
    }]
  }
}'

# 100%
kubectl patch vs backend-canary -n production --type merge -p '
{
  "spec": {
    "http": [{
      "route": [
        {"destination": {"host": "backend", "subset": "v2"}, "weight": 100}
      ]
    }]
  }
}'
```

**Stage 4: Cleanup**

```bash
# Delete v1 deployment
kubectl delete deployment backend-v1 -n production

# Update VirtualService to simple routing
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend
  namespace: production
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
EOF
```

### Manual Canary (Linkerd)

**Traffic Split Configuration:**

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-canary
  namespace: production
spec:
  parentRefs:
  - name: backend
    kind: Service
  rules:
  - backendRefs:
    - name: backend-v1
      port: 8080
      weight: 90
    - name: backend-v2
      port: 8080
      weight: 10
```

**Update Weights:**

```bash
# Increase to 25%
kubectl patch httproute backend-canary -n production --type merge -p '
{
  "spec": {
    "rules": [{
      "backendRefs": [
        {"name": "backend-v1", "port": 8080, "weight": 75},
        {"name": "backend-v2", "port": 8080, "weight": 25}
      ]
    }]
  }
}'
```

### Canary with Header-Based Routing

Test canary with internal users first.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-canary-staged
spec:
  hosts:
  - backend
  http:
  # Internal testers always see canary
  - match:
    - headers:
        x-canary-user:
          exact: "true"
    route:
    - destination:
        host: backend
        subset: v2
  # Production: gradual rollout
  - route:
    - destination:
        host: backend
        subset: v1
      weight: 90
    - destination:
        host: backend
        subset: v2
      weight: 10
```

### Monitoring During Canary

**Key Metrics to Watch:**

```promql
# Error rate comparison
sum(rate(http_requests_total{code=~"5..", version="v2"}[5m]))
/ sum(rate(http_requests_total{version="v2"}[5m]))

# Latency P95 comparison
histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket{version="v2"}[5m])) by (le)
)

# Request rate
sum(rate(http_requests_total{version="v2"}[5m]))
```

**Alert on Issues:**

```yaml
# Prometheus alert
- alert: CanaryHighErrorRate
  expr: |
    sum(rate(http_requests_total{code=~"5..", version="v2"}[5m]))
    / sum(rate(http_requests_total{version="v2"}[5m])) > 0.01
  for: 5m
  annotations:
    summary: "Canary v2 error rate above 1%"
```

## Blue/Green Deployments

Instant traffic cutover between versions.

### Blue/Green (Linkerd)

**Stage 1: Deploy Green Alongside Blue**

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
      version: blue
  template:
    metadata:
      labels:
        app: backend
        version: blue
    spec:
      containers:
      - name: backend
        image: backend:blue
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
      version: green
  template:
    metadata:
      labels:
        app: backend
        version: green
    spec:
      containers:
      - name: backend
        image: backend:green
---
apiVersion: v1
kind: Service
metadata:
  name: backend-blue
spec:
  selector:
    app: backend
    version: blue
  ports:
  - port: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: backend-green
spec:
  selector:
    app: backend
    version: green
  ports:
  - port: 8080
```

**Stage 2: Test Green with Subset**

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-bluegreen-test
spec:
  parentRefs:
  - name: backend
  rules:
  # Test traffic: route to green
  - matches:
    - headers:
      - name: x-version
        value: green
    backendRefs:
    - name: backend-green
      port: 8080
  # Production traffic: route to blue
  - backendRefs:
    - name: backend-blue
      port: 8080
```

**Stage 3: Instant Cutover to Green**

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-cutover
spec:
  parentRefs:
  - name: backend
  rules:
  - backendRefs:
    - name: backend-green
      port: 8080
```

**Stage 4: Rollback if Needed**

```bash
# Instant rollback to blue
kubectl apply -f - <<EOF
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-rollback
  namespace: production
spec:
  parentRefs:
  - name: backend
  rules:
  - backendRefs:
    - name: backend-blue
      port: 8080
EOF
```

### Blue/Green (Istio)

**Cutover Configuration:**

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-bluegreen
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend-green.production.svc.cluster.local
```

**Rollback:**

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-rollback
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend-blue.production.svc.cluster.local
```

## A/B Testing

Route traffic based on user segments.

### Cookie-Based A/B Test (Istio)

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: frontend-ab-test
spec:
  hosts:
  - frontend
  http:
  # Variant A: existing experience
  - match:
    - headers:
        cookie:
          regex: "^(.*?;)?(ab-test=a)(;.*)?$"
    route:
    - destination:
        host: frontend
        subset: variant-a
  # Variant B: new experience
  - match:
    - headers:
        cookie:
          regex: "^(.*?;)?(ab-test=b)(;.*)?$"
    route:
    - destination:
        host: frontend
        subset: variant-b
  # No cookie: 50/50 split
  - route:
    - destination:
        host: frontend
        subset: variant-a
      weight: 50
      headers:
        response:
          set:
            Set-Cookie: "ab-test=a; Max-Age=86400"
    - destination:
        host: frontend
        subset: variant-b
      weight: 50
      headers:
        response:
          set:
            Set-Cookie: "ab-test=b; Max-Age=86400"
```

### User-Agent Based Routing

Route mobile users differently.

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: mobile-routing
spec:
  hosts:
  - api
  http:
  - match:
    - headers:
        user-agent:
          regex: ".*(Mobile|Android|iPhone).*"
    route:
    - destination:
        host: api
        subset: mobile-optimized
  - route:
    - destination:
        host: api
        subset: desktop
```

### Geographic Routing

Route based on user location (requires geo headers).

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: geo-routing
spec:
  hosts:
  - api
  http:
  - match:
    - headers:
        x-user-region:
          exact: "us-east"
    route:
    - destination:
        host: api-us-east.production.svc.cluster.local
  - match:
    - headers:
        x-user-region:
          exact: "eu-west"
    route:
    - destination:
        host: api-eu-west.production.svc.cluster.local
```

## Automated Rollback

Automatically revert on metric failures.

### Prometheus-Based Alerts

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: canary-alerts
data:
  alerts.yaml: |
    groups:
    - name: canary
      interval: 30s
      rules:
      # High error rate
      - alert: CanaryHighErrors
        expr: |
          sum(rate(http_requests_total{code=~"5..", version="v2"}[5m]))
          / sum(rate(http_requests_total{version="v2"}[5m])) > 0.01
        for: 2m
        annotations:
          summary: "Canary error rate > 1%"

      # High latency
      - alert: CanaryHighLatency
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_request_duration_seconds_bucket{version="v2"}[5m])) by (le)
          ) > 0.5
        for: 2m
        annotations:
          summary: "Canary P95 latency > 500ms"
```

### Rollback Script

```bash
#!/bin/bash
# rollback-canary.sh

NAMESPACE="production"
SERVICE="backend"

echo "Rolling back canary deployment..."

kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: ${SERVICE}-canary
  namespace: ${NAMESPACE}
spec:
  hosts:
  - ${SERVICE}
  http:
  - route:
    - destination:
        host: ${SERVICE}
        subset: v1
      weight: 100
EOF

echo "Rollback complete. All traffic to v1."
```

## Flagger Integration

Automated progressive delivery with Flagger.

### Install Flagger

```bash
# Add Flagger Helm repository
helm repo add flagger https://flagger.app

# Install Flagger for Istio
helm install flagger flagger/flagger \
  --namespace istio-system \
  --set meshProvider=istio \
  --set metricsServer=http://prometheus:9090

# Install Flagger for Linkerd
helm install flagger flagger/flagger \
  --namespace linkerd \
  --set meshProvider=linkerd \
  --set metricsServer=http://prometheus:9090
```

### Canary with Flagger (Istio)

```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: backend
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    # Success rate must be > 99%
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    # P95 latency must be < 500ms
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
  webhooks:
  # Pre-rollout checks
  - name: pre-rollout
    type: pre-rollout
    url: http://flagger-loadtester/
    timeout: 15s
    metadata:
      type: bash
      cmd: "curl -sd 'test' http://backend-canary:8080/healthz"
  # Load testing during rollout
  - name: load-test
    url: http://flagger-loadtester/
    timeout: 5s
    metadata:
      cmd: "hey -z 1m -q 10 -c 2 http://backend-canary.production:8080/"
```

### Canary with Flagger (Linkerd)

```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: backend
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
```

### A/B Testing with Flagger

```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: frontend-ab
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  service:
    port: 80
  analysis:
    interval: 1m
    iterations: 10
    match:
    - headers:
        x-user-type:
          exact: "beta"
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
```

### Blue/Green with Flagger

```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: backend-bluegreen
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 10
    iterations: 2
    maxWeight: 100
    stepWeight: 100
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
```

### Monitor Flagger

```bash
# Watch canary progress
kubectl -n production get canaries --watch

# Describe canary status
kubectl -n production describe canary backend

# View Flagger events
kubectl -n production get events --sort-by='.lastTimestamp'
```

## Best Practices

**Canary Deployments:**
- Start with small traffic percentages (5-10%)
- Monitor key metrics: error rate, latency, throughput
- Increase traffic gradually (10% → 25% → 50% → 100%)
- Wait for stabilization between stages
- Set clear rollback criteria

**Blue/Green:**
- Test green environment thoroughly before cutover
- Use header-based routing for pre-production validation
- Keep blue environment running for quick rollback
- Monitor metrics after cutover
- Automate cutover and rollback procedures

**A/B Testing:**
- Use consistent user assignment (cookies, headers)
- Define success metrics before test
- Ensure statistical significance
- Isolate test variables
- Document test results

**Automated Rollback:**
- Define clear success criteria
- Use multiple metrics (error rate, latency, throughput)
- Set appropriate thresholds
- Implement alerts for failures
- Test rollback procedures regularly

**Flagger:**
- Use load testing webhooks for realistic traffic
- Set appropriate thresholds based on SLOs
- Monitor Flagger events for debugging
- Integrate with alerting systems
- Test analysis configuration in staging first

```

### references/multi-cluster.md

```markdown
# Multi-Cluster Service Mesh

## Table of Contents

- [Istio Multi-Cluster](#istio-multi-cluster)
- [Linkerd Multi-Cluster](#linkerd-multi-cluster)
- [Cilium Cluster Mesh](#cilium-cluster-mesh)
- [Traffic Patterns](#traffic-patterns)
- [Failover and HA](#failover-and-ha)

## Istio Multi-Cluster

Connect multiple Kubernetes clusters in a single mesh.

### Architecture Models

**Primary-Remote (Single Control Plane):**
- One cluster hosts Istiod
- Remote clusters use primary's control plane
- Best for: Small deployments, cost optimization

**Multi-Primary (Multiple Control Planes):**
- Each cluster has its own Istiod
- Meshes communicate peer-to-peer
- Best for: High availability, isolation

### Single Network Multi-Primary

Clusters share same network (pod IPs routable).

**Install on Cluster 1:**

```bash
# Set context
export CTX_CLUSTER1=cluster1
export CTX_CLUSTER2=cluster2

# Configure mesh ID and network
cat <<EOF > cluster1.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster1
      network: network1
EOF

# Install
istioctl install --context="${CTX_CLUSTER1}" -f cluster1.yaml
```

**Install on Cluster 2:**

```bash
cat <<EOF > cluster2.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster2
      network: network1
EOF

istioctl install --context="${CTX_CLUSTER2}" -f cluster2.yaml
```

**Enable Cross-Cluster Service Discovery:**

```bash
# Create remote secret for cluster2 on cluster1
istioctl x create-remote-secret \
  --context="${CTX_CLUSTER2}" \
  --name=cluster2 | \
  kubectl apply -f - --context="${CTX_CLUSTER1}"

# Create remote secret for cluster1 on cluster2
istioctl x create-remote-secret \
  --context="${CTX_CLUSTER1}" \
  --name=cluster1 | \
  kubectl apply -f - --context="${CTX_CLUSTER2}"
```

### Multi-Network Multi-Primary

Clusters on different networks (requires gateways).

**Install with East-West Gateway:**

```bash
# Cluster 1
cat <<EOF > cluster1-multinetwork.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster1
      network: network1
EOF

istioctl install --context="${CTX_CLUSTER1}" -f cluster1-multinetwork.yaml

# Install east-west gateway
samples/multicluster/gen-eastwest-gateway.sh \
  --mesh mesh1 --cluster cluster1 --network network1 | \
  istioctl --context="${CTX_CLUSTER1}" install -y -f -

# Expose services via east-west gateway
kubectl --context="${CTX_CLUSTER1}" apply -n istio-system -f \
  samples/multicluster/expose-services.yaml
```

**Cluster 2 (Same Process):**

```bash
# Install Istio
cat <<EOF > cluster2-multinetwork.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster2
      network: network2
EOF

istioctl install --context="${CTX_CLUSTER2}" -f cluster2-multinetwork.yaml

# Install east-west gateway
samples/multicluster/gen-eastwest-gateway.sh \
  --mesh mesh1 --cluster cluster2 --network network2 | \
  istioctl --context="${CTX_CLUSTER2}" install -y -f -

# Expose services
kubectl --context="${CTX_CLUSTER2}" apply -n istio-system -f \
  samples/multicluster/expose-services.yaml
```

**Exchange Secrets:**

```bash
# Cluster2 secret on cluster1
istioctl x create-remote-secret \
  --context="${CTX_CLUSTER2}" \
  --name=cluster2 | \
  kubectl apply -f - --context="${CTX_CLUSTER1}"

# Cluster1 secret on cluster2
istioctl x create-remote-secret \
  --context="${CTX_CLUSTER1}" \
  --name=cluster1 | \
  kubectl apply -f - --context="${CTX_CLUSTER2}"
```

### Primary-Remote Setup

Remote cluster uses primary's control plane.

**Primary Cluster:**

```bash
cat <<EOF > cluster1-primary.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster1
      network: network1
EOF

istioctl install --context="${CTX_CLUSTER1}" -f cluster1-primary.yaml
```

**Remote Cluster:**

```bash
# Generate remote configuration
istioctl x create-remote-secret \
  --context="${CTX_CLUSTER2}" \
  --name=cluster2 | \
  kubectl apply -f - --context="${CTX_CLUSTER1}"

# Install remote components
cat <<EOF > cluster2-remote.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: remote
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster2
      network: network2
      remotePilotAddress: <CLUSTER1_INGRESS_GATEWAY_IP>
EOF

istioctl install --context="${CTX_CLUSTER2}" -f cluster2-remote.yaml
```

### Verify Multi-Cluster

```bash
# Check remote secrets
kubectl get secrets -n istio-system | grep istio-remote-secret

# Verify endpoints
istioctl proxy-config endpoints <POD_NAME> -n production | grep cluster

# Test cross-cluster connectivity
kubectl exec -n production <POD> -- curl http://service.namespace.svc.cluster.local
```

## Linkerd Multi-Cluster

Link multiple Linkerd clusters.

### Setup Linkerd Multi-Cluster

**Cluster 1:**

```bash
# Install Linkerd
linkerd install --cluster-domain cluster1.local | \
  kubectl --context=cluster1 apply -f -

# Install multicluster components
linkerd multicluster install --cluster-domain cluster1.local | \
  kubectl --context=cluster1 apply -f -

# Check installation
linkerd --context=cluster1 check
linkerd --context=cluster1 multicluster check
```

**Cluster 2:**

```bash
# Install Linkerd
linkerd install --cluster-domain cluster2.local | \
  kubectl --context=cluster2 apply -f -

# Install multicluster components
linkerd multicluster install --cluster-domain cluster2.local | \
  kubectl --context=cluster2 apply -f -

# Check
linkerd --context=cluster2 check
linkerd --context=cluster2 multicluster check
```

### Link Clusters

**From Cluster 1 to Cluster 2:**

```bash
# Generate link
linkerd --context=cluster2 multicluster link --cluster-name cluster2 | \
  kubectl --context=cluster1 apply -f -

# Verify link
linkerd --context=cluster1 multicluster check

# View linked clusters
linkerd --context=cluster1 multicluster gateways
```

**From Cluster 2 to Cluster 1:**

```bash
linkerd --context=cluster1 multicluster link --cluster-name cluster1 | \
  kubectl --context=cluster2 apply -f -
```

### Export Services

**Export Service from Cluster 2:**

```bash
# Label service for export
kubectl --context=cluster2 label svc/backend \
  -n production \
  mirror.linkerd.io/exported=true

# Service automatically appears in cluster1 as:
# backend-cluster2.production.svc.cluster1.local
```

**Verify Mirrored Service:**

```bash
# Check mirrored service in cluster1
kubectl --context=cluster1 get svc -n production | grep cluster2

# Test connectivity
kubectl --context=cluster1 exec -n production <POD> -- \
  curl http://backend-cluster2.production:8080
```

### Unlink Clusters

```bash
# Remove link
linkerd --context=cluster1 multicluster unlink --cluster-name cluster2 | \
  kubectl delete -f -
```

## Cilium Cluster Mesh

Connect Cilium clusters at network layer.

### Enable Cluster Mesh

**Cluster 1:**

```bash
# Install Cilium with unique cluster ID
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set cluster.name=cluster1 \
  --set cluster.id=1 \
  --set ipam.mode=kubernetes

# Enable cluster mesh
cilium clustermesh enable --context cluster1
```

**Cluster 2:**

```bash
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set cluster.name=cluster2 \
  --set cluster.id=2 \
  --set ipam.mode=kubernetes

cilium clustermesh enable --context cluster2
```

### Connect Clusters

```bash
# Connect cluster1 to cluster2
cilium clustermesh connect \
  --context cluster1 \
  --destination-context cluster2

# Verify connection
cilium clustermesh status --context cluster1
```

### Global Services

**Create Global Service:**

```yaml
apiVersion: v1
kind: Service
metadata:
  name: backend
  namespace: production
  annotations:
    io.cilium/global-service: "true"
spec:
  type: ClusterIP
  ports:
  - port: 8080
  selector:
    app: backend
```

**Verify Global Service:**

```bash
# Check service endpoints across clusters
cilium service list | grep backend
```

### Cross-Cluster Policy

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: cross-cluster-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
        io.cilium.k8s.policy.cluster: cluster1
    - matchLabels:
        app: frontend
        io.cilium.k8s.policy.cluster: cluster2
```

## Traffic Patterns

### Locality-Based Routing (Istio)

Prefer local cluster, failover to remote.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend-locality
spec:
  host: backend.production.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      localityLbSetting:
        enabled: true
        distribute:
        - from: us-east/us-east-1/*
          to:
            "us-east/us-east-1/*": 80
            "us-west/us-west-1/*": 20
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
```

### Cross-Cluster Load Balancing (Linkerd)

**80/20 split between clusters:**

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-multi-cluster
  namespace: production
spec:
  parentRefs:
  - name: backend
  rules:
  - backendRefs:
    - name: backend          # Local cluster
      port: 8080
      weight: 80
    - name: backend-cluster2 # Remote cluster
      port: 8080
      weight: 20
```

### Active-Active Deployment

Deploy to both clusters, equal traffic.

```yaml
# Istio
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-active-active
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend.production.svc.cluster.local
      weight: 50
    - destination:
        host: backend.production.svc.cluster2.global
      weight: 50
```

## Failover and HA

### Automatic Failover (Istio)

Outlier detection for automatic failover.

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend-failover
spec:
  host: backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
      minHealthPercent: 20
    loadBalancer:
      localityLbSetting:
        enabled: true
        failover:
        - from: us-east
          to: us-west
```

### Health-Based Routing

Route based on endpoint health.

```yaml
apiVersion: v1
kind: Service
metadata:
  name: backend
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "false"
spec:
  selector:
    app: backend
  ports:
  - port: 8080
---
apiVersion: v1
kind: Pod
metadata:
  name: backend
spec:
  containers:
  - name: backend
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
```

### Circuit Breaking for Remote Clusters

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: remote-circuit-breaker
spec:
  host: backend.production.svc.cluster2.global
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 50
      http:
        http1MaxPendingRequests: 5
        http2MaxRequests: 50
    outlierDetection:
      consecutiveErrors: 3
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 100
```

### Disaster Recovery

**Cross-Region Failover:**

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: dr-failover
spec:
  host: backend
  trafficPolicy:
    loadBalancer:
      localityLbSetting:
        enabled: true
        failover:
        - from: us-east
          to: eu-west
        - from: eu-west
          to: us-east
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
```

## Best Practices

**Architecture:**
- Use multi-primary for production (HA)
- Use primary-remote for cost optimization
- Ensure network connectivity between clusters
- Use unique cluster IDs and names

**Service Discovery:**
- Test cross-cluster DNS resolution
- Monitor remote secret sync
- Use explicit service FQDNs when needed
- Document service naming conventions

**Security:**
- Enable mTLS across clusters
- Use same root CA for trust
- Rotate remote secrets regularly
- Monitor cross-cluster auth failures

**Performance:**
- Prefer local endpoints (locality-based routing)
- Set connection limits for remote calls
- Use circuit breakers for failover
- Monitor cross-cluster latency

**Resilience:**
- Configure outlier detection
- Set appropriate failover priorities
- Test failover scenarios regularly
- Monitor endpoint health

**Operations:**
- Automate cluster linking
- Monitor multi-cluster metrics
- Set up cross-cluster alerts
- Document runbooks for failures

```

### references/troubleshooting.md

```markdown
# Service Mesh Troubleshooting Guide

## Table of Contents

- [Common Issues and Solutions](#common-issues-and-solutions)
- [Debug Commands](#debug-commands)
- [Performance Tuning](#performance-tuning)
- [Monitoring and Alerts](#monitoring-and-alerts)
- [Recovery Procedures](#recovery-procedures)
- [Best Practices](#best-practices)

## Common Issues and Solutions

### mTLS Not Working

**Symptoms:**
- Services cannot communicate
- Connection refused errors
- "503 Service Unavailable" responses

**Diagnosis (Istio):**

```bash
# Check mTLS status
istioctl authn tls-check frontend.production.svc.cluster.local

# Check peer authentication policies
kubectl get peerauthentication -A

# Verify certificates
istioctl proxy-config secret deployment/frontend -n production

# Check proxy logs
kubectl logs -n production deployment/frontend -c istio-proxy
```

**Diagnosis (Linkerd):**

```bash
# Check mTLS edges
linkerd edges deployment/frontend -n production

# Verify identity
linkerd identity list -n production

# Check certificate expiry
linkerd identity show deployment/frontend -n production
```

**Solutions:**

```yaml
# Set PERMISSIVE mode temporarily
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: temp-permissive
  namespace: production
spec:
  mtls:
    mode: PERMISSIVE
```

```bash
# Restart pods to refresh certificates
kubectl rollout restart deployment/frontend -n production
```

### Traffic Not Routing Correctly

**Symptoms:**
- Traffic always goes to one version
- VirtualService rules not applied
- 404 errors on valid paths

**Diagnosis (Istio):**

```bash
# Analyze configuration
istioctl analyze -n production

# Check VirtualService
kubectl get virtualservice -n production
kubectl describe virtualservice backend-routing -n production

# Verify DestinationRule subsets
kubectl get destinationrule -n production
kubectl describe destinationrule backend -n production

# Check endpoints
istioctl proxy-config endpoints deployment/frontend -n production | grep backend
```

**Common Mistakes:**

```yaml
# WRONG: Missing subset definition
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
        subset: v2  # Subset not defined in DestinationRule
```

**Solution:**

```yaml
# CORRECT: Define subset in DestinationRule
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend
spec:
  host: backend
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
        subset: v2
```

### Authorization Policies Blocking Traffic

**Symptoms:**
- "RBAC: access denied" errors
- 403 Forbidden responses
- Previously working services now fail

**Diagnosis:**

```bash
# Check authorization policies
kubectl get authorizationpolicy -n production

# Describe specific policy
kubectl describe authorizationpolicy allow-frontend -n production

# Check proxy logs for denials
kubectl logs -n production deployment/backend -c istio-proxy | grep RBAC

# Test with policy temporarily disabled
kubectl delete authorizationpolicy deny-all -n production
```

**Debug with Audit Mode:**

```yaml
# Set to AUDIT instead of ENFORCE
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: test-policy
  namespace: production
spec:
  action: AUDIT  # Logs denials but allows traffic
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend
```

**Common Issues:**

```yaml
# WRONG: Typo in principal
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
spec:
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/fronted  # Typo: "fronted" not "frontend"
```

### Gateway Not Accessible

**Symptoms:**
- Cannot access ingress gateway from outside
- LoadBalancer IP not assigned
- TLS certificate errors

**Diagnosis:**

```bash
# Check gateway configuration
kubectl get gateway -n production
kubectl describe gateway https-gateway -n production

# Check ingress gateway service
kubectl get svc -n istio-system istio-ingressgateway

# Check gateway pods
kubectl get pods -n istio-system -l app=istio-ingressgateway

# View logs
kubectl logs -n istio-system -l app=istio-ingressgateway
```

**Check TLS Configuration:**

```bash
# Verify secret exists
kubectl get secret -n istio-system api-cert

# Check certificate details
kubectl get secret -n istio-system api-cert -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -text -noout
```

**Common Issues:**

```yaml
# WRONG: Gateway and VirtualService in different namespaces without proper reference
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: https-gateway
  namespace: istio-system
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: api-routing
  namespace: production
spec:
  gateways:
  - https-gateway  # WRONG: Should be "istio-system/https-gateway"
```

**Solution:**

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: api-routing
  namespace: production
spec:
  gateways:
  - istio-system/https-gateway  # CORRECT: namespace/gateway
```

### High Latency or Timeouts

**Symptoms:**
- Requests timing out
- High P95/P99 latencies
- Intermittent failures

**Diagnosis:**

```bash
# Check proxy stats
istioctl proxy-config clusters deployment/frontend -n production

# View connection pool stats
istioctl proxy-config endpoints deployment/frontend -n production

# Check for circuit breaker tripping
kubectl logs -n production deployment/frontend -c istio-proxy | grep "overflow"
```

**Adjust Timeouts:**

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
    timeout: 30s  # Increase from default 15s
```

**Adjust Circuit Breaker:**

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend
spec:
  host: backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 200  # Increase from 100
      http:
        http1MaxPendingRequests: 50  # Increase from 10
```

### Certificate Expiration Issues

**Symptoms:**
- "certificate has expired" errors
- mTLS failures after some time
- Unable to establish secure connections

**Diagnosis:**

```bash
# Check certificate expiry (Istio)
istioctl proxy-config secret deployment/frontend -n production -o json | \
  jq '.dynamicActiveSecrets[] | select(.name=="default") | .secret.tlsCertificate.certificateChain.inlineBytes' | \
  base64 -d | openssl x509 -text -noout | grep "Not After"

# Check Linkerd certificate expiry
linkerd identity check

# View certificate details
kubectl get secret istio-ca-secret -n istio-system -o yaml
```

**Solutions:**

```bash
# Restart pods to get new certificates
kubectl rollout restart deployment/frontend -n production

# Force certificate rotation (Istio)
kubectl delete secret istio-ca-secret -n istio-system
kubectl rollout restart deployment -n istio-system istiod
```

### Sidecar Injection Not Working

**Symptoms:**
- Pods don't have istio-proxy container
- Mesh features not working
- Service not showing in mesh

**Diagnosis:**

```bash
# Check namespace label
kubectl get namespace production --show-labels

# Check injection webhook
kubectl get mutatingwebhookconfigurations | grep istio

# Check pod for sidecar
kubectl get pod -n production <POD_NAME> -o jsonpath='{.spec.containers[*].name}'

# View injection status
kubectl get pod -n production <POD_NAME> -o jsonpath='{.metadata.annotations.sidecar\.istio\.io/status}'
```

**Solutions:**

```bash
# Label namespace for injection
kubectl label namespace production istio-injection=enabled

# Restart deployments
kubectl rollout restart deployment -n production

# For individual pod, add annotation
kubectl patch deployment frontend -n production -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.istio.io/inject":"true"}}}}}'
```

## Debug Commands

### Istio

**Configuration Analysis:**

```bash
# Analyze all namespaces
istioctl analyze -A

# Analyze specific namespace
istioctl analyze -n production

# Check installation
istioctl verify-install

# View mesh configuration
kubectl get configmap istio -n istio-system -o yaml
```

**Proxy Configuration:**

```bash
# View all proxy config
istioctl proxy-config all deployment/frontend -n production

# View listeners
istioctl proxy-config listeners deployment/frontend -n production

# View routes
istioctl proxy-config routes deployment/frontend -n production

# View clusters
istioctl proxy-config clusters deployment/frontend -n production

# View endpoints
istioctl proxy-config endpoints deployment/frontend -n production

# View secrets (certificates)
istioctl proxy-config secrets deployment/frontend -n production
```

**Logs:**

```bash
# View proxy logs
kubectl logs -n production deployment/frontend -c istio-proxy

# Follow logs
kubectl logs -n production deployment/frontend -c istio-proxy -f

# View control plane logs
kubectl logs -n istio-system deployment/istiod
```

**Traffic Debugging:**

```bash
# Enable debug logging
istioctl proxy-config log deployment/frontend -n production --level debug

# Disable debug logging
istioctl proxy-config log deployment/frontend -n production --level info
```

### Linkerd

**Health Checks:**

```bash
# Check overall health
linkerd check

# Check data plane
linkerd check --proxy

# Check multi-cluster
linkerd multicluster check
```

**Traffic Inspection:**

```bash
# Tap live traffic
linkerd tap deployment/frontend -n production

# Tap specific route
linkerd tap deployment/frontend -n production --path /api/users

# Tap with filtering
linkerd tap deployment/frontend -n production \
  --method GET \
  --authority backend.production.svc.cluster.local
```

**Statistics:**

```bash
# Service stats
linkerd stat deployment/frontend -n production

# Route stats
linkerd routes deployment/frontend -n production

# Edge stats (mTLS)
linkerd edges deployment/frontend -n production
```

**Dashboard:**

```bash
# Launch dashboard
linkerd dashboard

# Access specific namespace
linkerd viz dashboard -n production
```

### Cilium

**Status Checks:**

```bash
# Overall status
cilium status

# Connectivity test
cilium connectivity test

# Check specific endpoint
cilium endpoint list
cilium endpoint get <ENDPOINT_ID>
```

**Policy Debugging:**

```bash
# View policies
cilium policy get

# Validate policy
cilium policy validate <POLICY_FILE>

# Trace policy decision
cilium monitor --type policy-verdict
```

**Hubble Observability:**

```bash
# Observe all traffic
hubble observe

# Observe specific namespace
hubble observe --namespace production

# Observe dropped packets
hubble observe --verdict DROPPED

# Observe HTTP traffic
hubble observe --protocol http

# Observe specific pod
hubble observe --pod backend-7d8f9c5b4-xyz12
```

**Network Troubleshooting:**

```bash
# View BPF maps
cilium bpf endpoint list
cilium bpf ct list global
cilium bpf nat list

# Check service endpoints
cilium service list

# Monitor events
cilium monitor
```

## Performance Tuning

### Reduce Latency Overhead

**Use Ambient Mode (Istio):**

```bash
# Install ambient profile
istioctl install --set profile=ambient -y

# Add namespace to ambient
kubectl label namespace production istio.io/dataplane-mode=ambient
```

**Optimize Envoy:**

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio
  namespace: istio-system
data:
  mesh: |
    defaultConfig:
      concurrency: 2
      drainDuration: 5s
      parentShutdownDuration: 10s
```

**Adjust Resource Limits:**

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-sidecar-injector
  namespace: istio-system
data:
  values: |
    sidecarInjectorWebhook:
      neverInjectSelector:
      - matchLabels:
          app: high-perf-service
```

### Reduce Resource Usage

**Limit Proxy CPU/Memory:**

```yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      proxy:
        resources:
          requests:
            cpu: 10m
            memory: 40Mi
          limits:
            cpu: 100m
            memory: 128Mi
```

**Use Sidecar Resource:**

```yaml
apiVersion: networking.istio.io/v1
kind: Sidecar
metadata:
  name: frontend-sidecar
  namespace: production
spec:
  workloadSelector:
    labels:
      app: frontend
  egress:
  - hosts:
    - "production/*"
    - "istio-system/*"
```

## Monitoring and Alerts

### Key Metrics to Monitor

**Control Plane:**
```promql
# Istiod memory usage
container_memory_working_set_bytes{pod=~"istiod.*"}

# Istiod CPU usage
rate(container_cpu_usage_seconds_total{pod=~"istiod.*"}[5m])

# Configuration push time
pilot_proxy_convergence_time_bucket
```

**Data Plane:**
```promql
# Envoy memory
envoy_server_memory_allocated

# Connection pool overflow
envoy_cluster_upstream_rq_pending_overflow

# Circuit breaker tripped
envoy_cluster_circuit_breakers_default_cx_open
```

**mTLS:**
```promql
# mTLS success rate
sum(rate(istio_requests_total{connection_security_policy="mutual_tls"}[5m]))
/ sum(rate(istio_requests_total[5m]))
```

### Alert Examples

```yaml
groups:
- name: implementing-service-mesh
  rules:
  - alert: IstiodDown
    expr: up{job="istiod"} == 0
    for: 1m
    annotations:
      summary: "Istio control plane down"

  - alert: HighProxyMemory
    expr: container_memory_working_set_bytes{container="istio-proxy"} > 500000000
    for: 5m
    annotations:
      summary: "Proxy using >500MB memory"

  - alert: mTLSDisabled
    expr: |
      sum(rate(istio_requests_total{connection_security_policy!="mutual_tls"}[5m]))
      / sum(rate(istio_requests_total[5m])) > 0.01
    for: 5m
    annotations:
      summary: "More than 1% traffic without mTLS"
```

## Recovery Procedures

### Rollback Istio Upgrade

```bash
# Revert to previous version
istioctl install --set revision=1-19-0

# Update workloads to use old revision
kubectl label namespace production istio.io/rev=1-19-0 --overwrite

# Restart workloads
kubectl rollout restart deployment -n production
```

### Emergency Mesh Disable

```bash
# Remove sidecar injection
kubectl label namespace production istio-injection-

# Restart pods
kubectl rollout restart deployment -n production
```

### Certificate Recovery

```bash
# Regenerate CA certificates
istioctl x ca create \
  --ca-name istio-ca \
  --ca-namespace istio-system \
  --overwrite

# Restart control plane
kubectl rollout restart deployment -n istio-system istiod

# Restart data plane
kubectl rollout restart deployment -n production
```

## Best Practices

**Debugging:**
- Enable debug logs temporarily only
- Use analyze before applying configs
- Check control plane health first
- Verify certificates regularly
- Monitor proxy resource usage

**Performance:**
- Use ambient mode for lower overhead
- Tune connection pools appropriately
- Set reasonable timeouts
- Monitor latency metrics
- Load test before production

**Operations:**
- Automate health checks
- Set up proper monitoring
- Document runbooks
- Test recovery procedures
- Keep mesh components updated

```

implementing-service-mesh | SkillHub