managing-dns
Manage DNS records, TTL strategies, and DNS-as-code automation for infrastructure. Use when configuring domain resolution, automating DNS from Kubernetes with external-dns, setting up DNS-based load balancing, or troubleshooting propagation issues across cloud providers (Route53, Cloud DNS, Azure DNS, Cloudflare).
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install ancoleman-ai-design-components-managing-dns
Repository
Skill path: skills/managing-dns
Manage DNS records, TTL strategies, and DNS-as-code automation for infrastructure. Use when configuring domain resolution, automating DNS from Kubernetes with external-dns, setting up DNS-based load balancing, or troubleshooting propagation issues across cloud providers (Route53, Cloud DNS, Azure DNS, Cloudflare).
Open repositoryBest for
Primary workflow: Run DevOps.
Technical facets: Full Stack, DevOps.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: ancoleman.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install managing-dns into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/ancoleman/ai-design-components before adding managing-dns to shared team environments
- Use managing-dns for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: managing-dns
description: Manage DNS records, TTL strategies, and DNS-as-code automation for infrastructure. Use when configuring domain resolution, automating DNS from Kubernetes with external-dns, setting up DNS-based load balancing, or troubleshooting propagation issues across cloud providers (Route53, Cloud DNS, Azure DNS, Cloudflare).
---
# DNS Management
Configure and automate DNS records with proper TTL strategies, DNS-as-code patterns, and troubleshooting techniques.
## Purpose
Guide DNS configuration for applications, infrastructure, and services with focus on:
- Record type selection (A, AAAA, CNAME, MX, TXT, SRV, CAA)
- TTL strategies for propagation and caching
- DNS-as-code automation (external-dns, OctoDNS, DNSControl)
- Cloud DNS services comparison and selection
- DNS-based load balancing patterns
- Troubleshooting tools and techniques
## When to Use This Skill
Apply DNS management patterns when:
- Setting up DNS for new applications or services
- Automating DNS updates from Kubernetes workloads
- Configuring DNS-based failover or load balancing
- Troubleshooting DNS propagation or resolution issues
- Migrating DNS between providers
- Planning DNS changes with minimal downtime
- Implementing GeoDNS for global users
## Record Type Selection
### Quick Reference
**Address Resolution:**
- **A Record**: Map hostname to IPv4 address (example.com → 192.0.2.1)
- **AAAA Record**: Map hostname to IPv6 address (example.com → 2001:db8::1)
- **CNAME Record**: Alias to another domain (www.example.com → example.com)
- Cannot use at zone apex (@)
- Cannot coexist with other records at same name
**Email Configuration:**
- **MX Record**: Direct email to mail servers with priority
- **TXT Record**: Email authentication (SPF, DKIM, DMARC) and verification
**Service Discovery:**
- **SRV Record**: Specify service location (protocol, priority, weight, port, target)
**Delegation and Security:**
- **NS Record**: Delegate subdomain to different nameservers
- **CAA Record**: Restrict which Certificate Authorities can issue certificates
**Cloud-Specific:**
- **ALIAS Record**: Like CNAME but works at zone apex (Route53, Cloudflare)
### Decision Tree
```
Need to point domain to:
├─ IPv4 Address? → A record
├─ IPv6 Address? → AAAA record
├─ Another Domain?
│ ├─ Zone apex (@) → ALIAS/ANAME or A record
│ └─ Subdomain → CNAME
├─ Mail Server? → MX record (with priority)
├─ Email Authentication? → TXT record (SPF/DKIM/DMARC)
├─ Service Discovery? → SRV record
├─ Domain Verification? → TXT record
├─ Certificate Control? → CAA record
└─ Subdomain Delegation? → NS record
```
For detailed record type examples and patterns, see `references/record-types.md`.
## TTL Strategy
### Standard TTL Values
**By Change Frequency:**
- **Stable records**: 3600-86400s (1-24 hours) - NS, stable A/AAAA
- **Normal operation**: 3600s (1 hour) - Standard websites, MX
- **Moderate changes**: 300-1800s (5-30 min) - Development, A/B testing
- **Failover scenarios**: 60-300s (1-5 min) - Critical records needing fast updates
**Key Principle:** Lower TTL = faster propagation but higher DNS query load
### Pre-Change Process
When planning DNS changes:
```
T-48h: Lower TTL to 300s
T-24h: Verify TTL propagated globally
T-0h: Make DNS change
T+1h: Verify new records propagating
T+6h: Confirm global propagation
T+24h: Raise TTL back to normal (3600s)
```
**Propagation Formula:** `Max Time = Old TTL + New TTL + Query Time`
Example: Changing a record with 3600s TTL takes up to 2 hours to fully propagate.
### TTL by Use Case
| Use Case | TTL | Rationale |
|----------|-----|-----------|
| Production (stable) | 3600s | Balance speed and load |
| Before planned change | 300s | Fast propagation |
| Development/staging | 300-600s | Frequent changes |
| DNS-based failover | 60-300s | Fast recovery |
| Mail servers | 3600s | Rarely change |
| NS records | 86400s | Very stable |
For detailed TTL scenarios and calculations, see `references/ttl-strategies.md`.
## DNS-as-Code Tools
### Tool Selection by Use Case
**Kubernetes DNS Automation → external-dns**
- Annotation-based configuration on Services/Ingresses
- Automatic sync to DNS providers (20+ supported)
- No manual DNS updates required
- See `examples/external-dns/`
**Multi-Provider DNS Management → OctoDNS or DNSControl**
- Version control for DNS records
- Sync configuration across multiple providers
- Preview changes before applying
- OctoDNS (Python/YAML) - See `examples/octodns/`
- DNSControl (JavaScript) - See `examples/dnscontrol/`
**Infrastructure-as-Code → Terraform**
- Manage DNS alongside cloud resources
- Provider-specific resources (aws_route53_record, etc.)
- See `examples/terraform/`
### Tool Comparison
| Tool | Language | Best For | Kubernetes | Multi-Provider |
|------|----------|----------|------------|----------------|
| external-dns | Go | K8s automation | ★★★★★ | ★★★★ |
| OctoDNS | Python/YAML | Version control | ★★★ | ★★★★★ |
| DNSControl | JavaScript | Complex logic | ★★ | ★★★★★ |
| Terraform | HCL | IaC integration | ★★★ | ★★★★ |
### Quick Start: external-dns
```yaml
# Kubernetes Service with DNS annotation
apiVersion: v1
kind: Service
metadata:
name: app
annotations:
external-dns.alpha.kubernetes.io/hostname: app.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
ports:
- port: 80
```
Deploy external-dns controller once, then all annotated Services/Ingresses automatically create DNS records.
For complete examples, see `examples/external-dns/` and `references/dns-as-code-comparison.md`.
## Cloud DNS Provider Selection
### Provider Characteristics
**AWS Route53**
- Best for AWS-heavy infrastructure
- Advanced routing policies (weighted, latency, geolocation, failover)
- Health checks with automatic failover
- ALIAS records for AWS resources (ELB, CloudFront, S3)
- Pricing: $0.50/month per zone + $0.40 per million queries
**Google Cloud DNS**
- Best for GCP-native applications
- Strong DNSSEC support with automatic key rotation
- Private zones for VPC internal DNS
- Split-horizon DNS (different internal/external records)
- Pricing: $0.20/month per zone + $0.40 per million queries
**Azure DNS**
- Best for Azure-native applications
- Integration with Azure Traffic Manager
- Azure Private DNS zones
- Azure RBAC for access control
- Pricing: $0.50/month per zone + $0.40 per million queries
**Cloudflare**
- Best for multi-cloud or cloud-agnostic
- Fastest DNS query times globally
- Built-in DDoS protection
- Free tier with unlimited queries
- CDN integration
- Pricing: Free tier, $20/month Pro, $200/month Business
### Selection Decision Tree
```
Choose based on:
├─ AWS-heavy? → Route53
├─ GCP-native? → Cloud DNS
├─ Azure-native? → Azure DNS
├─ Multi-cloud? → Cloudflare or OctoDNS/DNSControl
├─ Need fastest global DNS? → Cloudflare
├─ Need DDoS protection? → Cloudflare
└─ Budget-conscious? → Cloudflare (free tier) or Cloud DNS (lowest zone cost)
```
For detailed provider comparisons and examples, see `references/cloud-providers.md`.
## DNS-Based Load Balancing
### GeoDNS (Geographic Routing)
Return different IP addresses based on client location to:
- Reduce latency (route to nearest data center)
- Comply with data residency requirements
- Distribute load across regions
**Example Pattern:**
```
Client Location → DNS Response
├─ North America → 192.0.2.1 (US data center)
├─ Europe → 192.0.2.10 (EU data center)
└─ Default → CloudFront edge (global CDN)
```
### Weighted Routing
Distribute traffic by percentage for:
- Blue-green deployments
- Canary releases (10% to new version)
- A/B testing
**Example Pattern:**
```
DNS Responses:
├─ 90% → 192.0.2.1 (stable version)
└─ 10% → 192.0.2.2 (canary version)
```
### Health Check-Based Failover
Automatically route traffic away from unhealthy endpoints.
**Pattern:**
```
Primary: 192.0.2.1 (health checked every 30s)
├─ Healthy → Return primary IP
└─ Unhealthy → Return secondary IP (192.0.2.2)
Failover time: ~2-3 minutes
= Health check failures (90s) + TTL expiration (60s)
```
For complete load balancing examples, see `examples/load-balancing/`.
## Troubleshooting
### Essential Commands
**Check DNS Resolution:**
```bash
# Basic query
dig example.com
# Clean output (just IP)
dig example.com +short
# Query specific DNS server
dig @8.8.8.8 example.com
dig @1.1.1.1 example.com
# Trace resolution path
dig +trace example.com
```
**Check TTL:**
```bash
dig example.com | grep -A1 "ANSWER SECTION"
# Look for TTL value (number before IN A)
```
**Check Propagation:**
```bash
# Multiple resolvers
dig @8.8.8.8 example.com +short # Google
dig @1.1.1.1 example.com +short # Cloudflare
dig @208.67.222.222 example.com +short # OpenDNS
```
**Flush Local DNS Cache:**
```bash
# macOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Windows
ipconfig /flushdns
# Linux
sudo systemd-resolve --flush-caches
```
### Common Problems
**Slow Propagation:**
- Check current TTL (old TTL must expire first)
- Lower TTL 24-48 hours before changes
- Use propagation checkers: whatsmydns.net, dnschecker.org
**CNAME at Zone Apex:**
- Error: Cannot use CNAME at @ (zone apex)
- Solution: Use ALIAS record (Route53, Cloudflare) or A record
**external-dns Not Creating Records:**
- Verify annotation spelling: `external-dns.alpha.kubernetes.io/hostname`
- Check domain filter matches: `--domain-filter=example.com`
- Review external-dns logs for errors
- Confirm provider credentials configured
For detailed troubleshooting, see `references/troubleshooting.md`.
## Common Patterns
### Pattern 1: Kubernetes DNS Automation
```yaml
# Deploy external-dns (once per cluster)
helm install external-dns external-dns/external-dns \
--set provider=aws \
--set domainFilters[0]=example.com \
--set policy=sync
# Then annotate Services
apiVersion: v1
kind: Service
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: api.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
```
### Pattern 2: Multi-Provider Sync with OctoDNS
```yaml
# octodns-config.yaml
providers:
config:
class: octodns.provider.yaml.YamlProvider
directory: ./config
route53:
class: octodns_route53.Route53Provider
cloudflare:
class: octodns_cloudflare.CloudflareProvider
zones:
example.com.:
sources: [config]
targets: [route53, cloudflare]
```
### Pattern 3: DNS-Based Failover
```hcl
# Route53 with health checks
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}
resource "aws_route53_record" "primary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
records = ["192.0.2.1"]
}
resource "aws_route53_record" "secondary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "secondary"
failover_routing_policy {
type = "SECONDARY"
}
records = ["192.0.2.2"]
}
```
## Integration with Other Skills
**infrastructure-as-code:**
- Manage DNS via Terraform/Pulumi alongside other resources
- Zone configuration in IaC repositories
**kubernetes-operations:**
- external-dns automates DNS for Kubernetes workloads
- Ingress controller integration for automatic DNS
**load-balancing-patterns:**
- DNS-based load balancing (GeoDNS, weighted routing)
- Health checks and failover configurations
**security-hardening:**
- DNSSEC for DNS integrity
- CAA records for certificate authority control
- DNS-based DDoS mitigation
**secret-management:**
- Store DNS provider API credentials in vaults
- Secure DDNS update mechanisms
## Additional Resources
**Reference Documentation:**
- `references/record-types.md` - Detailed record type guide with examples
- `references/ttl-strategies.md` - TTL scenarios and propagation calculations
- `references/cloud-providers.md` - Provider comparison and detailed features
- `references/troubleshooting.md` - Common problems and solutions
- `references/dns-as-code-comparison.md` - Tool comparison matrix
**Examples:**
- `examples/external-dns/` - Kubernetes DNS automation
- `examples/octodns/` - Multi-provider sync with YAML
- `examples/dnscontrol/` - Multi-provider with JavaScript DSL
- `examples/terraform/` - Cloud provider configurations
- `examples/load-balancing/` - GeoDNS and failover patterns
**Scripts:**
- `scripts/check-dns-propagation.sh` - Verify propagation across resolvers
- `scripts/validate-dns-config.py` - Validate DNS configuration
- `scripts/export-dns-records.sh` - Export existing DNS records
- `scripts/calculate-ttl-propagation.py` - Calculate propagation time
## Quick Reference
### Record Types Cheat Sheet
| Record | Purpose | Example |
|--------|---------|---------|
| A | IPv4 address | example.com → 192.0.2.1 |
| AAAA | IPv6 address | example.com → 2001:db8::1 |
| CNAME | Alias to domain | www → example.com |
| MX | Mail server | 10 mail.example.com |
| TXT | Text/verification | "v=spf1 include:_spf.google.com ~all" |
| SRV | Service location | 10 60 5060 sip.example.com |
| NS | Nameserver delegation | ns1.provider.com |
| CAA | CA authorization | 0 issue "letsencrypt.org" |
### TTL Cheat Sheet
| Scenario | TTL | Why |
|----------|-----|-----|
| Stable production | 3600s | Balance speed/load |
| Before change | 300s | Fast propagation |
| Failover | 60-300s | Fast recovery |
| NS records | 86400s | Very stable |
### Provider Cheat Sheet
| Provider | Best For | Key Feature |
|----------|----------|-------------|
| Route53 | AWS | Advanced routing, health checks |
| Cloud DNS | GCP | DNSSEC, private zones |
| Azure DNS | Azure | Traffic Manager integration |
| Cloudflare | Multi-cloud | Fastest, DDoS protection, free tier |
### Tool Cheat Sheet
| Tool | Use When |
|------|----------|
| external-dns | Kubernetes DNS automation |
| OctoDNS | Multi-provider, Python shop |
| DNSControl | Multi-provider, JavaScript preference |
| Terraform | Managing DNS with other infrastructure |
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/record-types.md
```markdown
# DNS Record Types - Detailed Reference
Complete guide to DNS record types with examples, use cases, and best practices.
## Table of Contents
1. [Address Records](#address-records)
2. [Mail Records](#mail-records)
3. [Service Discovery Records](#service-discovery-records)
4. [Delegation and Authority Records](#delegation-and-authority-records)
5. [Security Records](#security-records)
6. [Cloud-Specific Records](#cloud-specific-records)
7. [Record Type Selection Decision Tree](#record-type-selection-decision-tree)
---
## Address Records
### A Record (IPv4 Address)
**Purpose:** Map hostname to IPv4 address
**Format:**
```
hostname TTL IN A ipv4-address
```
**Examples:**
```
# Basic A record
example.com. 3600 IN A 192.0.2.1
www.example.com. 3600 IN A 192.0.2.1
# Multiple A records (round-robin)
example.com. 300 IN A 192.0.2.1
example.com. 300 IN A 192.0.2.2
example.com. 300 IN A 192.0.2.3
```
**When to Use:**
- Point domain to server IPv4 address
- Load balancing with round-robin DNS
- Zone apex (@) records
**TTL Recommendations:**
- Stable servers: 3600s (1 hour)
- Load balanced: 300s (5 minutes)
- Before changes: 300s
**Common Mistakes:**
- Using CNAME instead of A at zone apex (not allowed)
- Setting TTL too high before planned changes
- Forgetting to add both @ and www records
---
### AAAA Record (IPv6 Address)
**Purpose:** Map hostname to IPv6 address
**Format:**
```
hostname TTL IN AAAA ipv6-address
```
**Examples:**
```
example.com. 3600 IN AAAA 2001:db8::1
www.example.com. 3600 IN AAAA 2001:db8::1
```
**When to Use:**
- IPv6-enabled servers
- Dual-stack deployments (A + AAAA)
- Future-proofing infrastructure
**Best Practices:**
- Always include A record alongside AAAA (dual-stack)
- Use same TTL for A and AAAA records
- Test IPv6 connectivity before adding AAAA
---
### CNAME Record (Canonical Name)
**Purpose:** Alias one domain to another
**Format:**
```
alias TTL IN CNAME target.
```
**Examples:**
```
# Basic CNAME
www.example.com. 3600 IN CNAME example.com.
blog.example.com. 3600 IN CNAME example.com.
# CNAME to external service
shop.example.com. 3600 IN CNAME shops.myshopify.com.
docs.example.com. 3600 IN CNAME cname.vercel-dns.com.
```
**When to Use:**
- Alias subdomains to main domain
- Point to external services (CDN, hosting)
- Create friendly names for complex hostnames
**Restrictions:**
- **Cannot** use at zone apex (@)
- **Cannot** coexist with other records at same name
- Target must be FQDN (fully qualified domain name with trailing dot)
**Common Mistakes:**
```
# ❌ WRONG - CNAME at zone apex
example.com. 3600 IN CNAME target.example.com.
# ✅ CORRECT - Use A or ALIAS at zone apex
example.com. 3600 IN A 192.0.2.1
# ❌ WRONG - CNAME with MX record
mail.example.com. 3600 IN CNAME example.com.
mail.example.com. 3600 IN MX 10 mail.example.com.
# ✅ CORRECT - Use A record instead
mail.example.com. 3600 IN A 192.0.2.1
example.com. 3600 IN MX 10 mail.example.com.
```
**TTL Recommendations:**
- Stable CNAMEs: 3600-86400s (1-24 hours)
- CDN/hosting: 3600s (1 hour)
---
## Mail Records
### MX Record (Mail Exchange)
**Purpose:** Direct email to mail servers
**Format:**
```
domain TTL IN MX priority mail-server.
```
**Examples:**
**Google Workspace:**
```
example.com. 3600 IN MX 1 aspmx.l.google.com.
example.com. 3600 IN MX 5 alt1.aspmx.l.google.com.
example.com. 3600 IN MX 5 alt2.aspmx.l.google.com.
example.com. 3600 IN MX 10 alt3.aspmx.l.google.com.
example.com. 3600 IN MX 10 alt4.aspmx.l.google.com.
```
**Microsoft 365:**
```
example.com. 3600 IN MX 0 example-com.mail.protection.outlook.com.
```
**Self-Hosted:**
```
example.com. 3600 IN MX 10 mail1.example.com.
example.com. 3600 IN MX 20 mail2.example.com.
```
**Priority Values:**
- Lower number = higher priority
- Sending servers try lowest priority first
- Same priority = load balancing (random selection)
**Best Practices:**
- Always include multiple MX records for redundancy
- Use priority 10, 20, 30 (leave room for future additions)
- Ensure mail servers have A/AAAA records
- Add SPF/DKIM/DMARC TXT records for email authentication
**TTL Recommendations:**
- Standard: 3600-86400s (1-24 hours)
- Mail servers rarely change
---
### TXT Record (Text)
**Purpose:** Store arbitrary text data, commonly used for:
- SPF (Sender Policy Framework)
- DKIM (DomainKeys Identified Mail)
- DMARC (Domain-based Message Authentication)
- Domain verification
- Other metadata
**Format:**
```
hostname TTL IN TXT "text-content"
```
**SPF Examples:**
```
# Google Workspace
example.com. 3600 IN TXT "v=spf1 include:_spf.google.com ~all"
# Microsoft 365
example.com. 3600 IN TXT "v=spf1 include:spf.protection.outlook.com ~all"
# Multiple mail sources
example.com. 3600 IN TXT "v=spf1 ip4:192.0.2.0/24 include:_spf.google.com mx ~all"
# Strict SPF (reject all others)
example.com. 3600 IN TXT "v=spf1 include:_spf.google.com -all"
```
**SPF Qualifiers:**
- `+all` - Pass all (not recommended)
- `~all` - Soft fail (most common)
- `-all` - Hard fail (strict)
- `?all` - Neutral
**DKIM Examples:**
```
# DKIM selector record
default._domainkey.example.com. 3600 IN TXT "v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA..."
# Google DKIM
google._domainkey.example.com. 3600 IN TXT "v=DKIM1; k=rsa; p=..."
```
**DMARC Examples:**
```
# Quarantine policy with reporting
_dmarc.example.com. 3600 IN TXT "v=DMARC1; p=quarantine; rua=mailto:[email protected]"
# Reject policy
_dmarc.example.com. 3600 IN TXT "v=DMARC1; p=reject; rua=mailto:[email protected]; ruf=mailto:[email protected]"
# Monitor mode (no action)
_dmarc.example.com. 3600 IN TXT "v=DMARC1; p=none; rua=mailto:[email protected]"
```
**Domain Verification Examples:**
```
# Google site verification
example.com. 3600 IN TXT "google-site-verification=abc123def456..."
# Facebook domain verification
example.com. 3600 IN TXT "facebook-domain-verification=abc123def456..."
# General verification token
example.com. 3600 IN TXT "verification-token=xyz789..."
```
**Best Practices:**
- Keep TXT records under 255 characters per string
- Split long records into multiple strings
- Use descriptive prefixes (_dmarc, _domainkey)
- Document purpose of each TXT record
**TTL Recommendations:**
- Verification records: 3600s (remove after verification)
- SPF/DKIM/DMARC: 3600-86400s (1-24 hours)
---
## Service Discovery Records
### SRV Record (Service Locator)
**Purpose:** Specify location of services
**Format:**
```
_service._protocol.domain TTL IN SRV priority weight port target.
```
**Components:**
- **priority**: Lower = higher priority (like MX)
- **weight**: Load distribution (0 = no preference)
- **port**: Service port number
- **target**: Hostname providing the service
**Examples:**
**SIP (VoIP):**
```
_sip._tcp.example.com. 3600 IN SRV 10 60 5060 sipserver.example.com.
_sip._udp.example.com. 3600 IN SRV 10 60 5060 sipserver.example.com.
```
**XMPP/Jabber:**
```
_xmpp-client._tcp.example.com. 3600 IN SRV 5 0 5222 xmpp.example.com.
_xmpp-server._tcp.example.com. 3600 IN SRV 5 0 5269 xmpp.example.com.
```
**LDAP:**
```
_ldap._tcp.example.com. 3600 IN SRV 0 0 389 ldap.example.com.
```
**Minecraft Server:**
```
_minecraft._tcp.example.com. 3600 IN SRV 0 5 25565 mc.example.com.
```
**When to Use:**
- Service discovery (VoIP, messaging, game servers)
- Multiple servers with priority/weight
- Port-specific service routing
**Best Practices:**
- Ensure target hostname has A/AAAA record
- Use priority for failover
- Use weight for load balancing among same priority
---
## Delegation and Authority Records
### NS Record (Name Server)
**Purpose:** Delegate subdomain to different nameservers
**Format:**
```
subdomain TTL IN NS nameserver.
```
**Examples:**
**Zone delegation:**
```
# Delegate subdomain.example.com to different nameservers
subdomain.example.com. 86400 IN NS ns1.provider.com.
subdomain.example.com. 86400 IN NS ns2.provider.com.
```
**Root zone NS records:**
```
example.com. 86400 IN NS ns1.example.com.
example.com. 86400 IN NS ns2.example.com.
```
**When to Use:**
- Delegate subdomain to different DNS provider
- Separate management of different subdomains
- Multi-team DNS management
**Best Practices:**
- Always specify multiple NS records (minimum 2)
- Use high TTL (86400-172800s / 1-2 days)
- Ensure glue records exist for in-zone nameservers
**TTL Recommendations:**
- 86400s (24 hours) - NS records rarely change
---
### SOA Record (Start of Authority)
**Purpose:** Define zone metadata
**Format:**
```
domain TTL IN SOA primary-ns admin-email (
serial ; Serial number
refresh ; Refresh interval
retry ; Retry interval
expire ; Expiration time
minimum ; Minimum TTL
)
```
**Example:**
```
example.com. 3600 IN SOA ns1.example.com. admin.example.com. (
2024120401 ; Serial (YYYYMMDDnn)
7200 ; Refresh (2 hours)
3600 ; Retry (1 hour)
1209600 ; Expire (2 weeks)
3600 ; Minimum TTL (1 hour)
)
```
**Best Practices:**
- Automatically managed by DNS providers (rarely edit manually)
- Increment serial number when making zone changes
- Use date-based serial: YYYYMMDDnn format
---
## Security Records
### CAA Record (Certificate Authority Authorization)
**Purpose:** Restrict which Certificate Authorities can issue certificates
**Format:**
```
domain TTL IN CAA flags tag "value"
```
**Tags:**
- `issue`: Authorize CA for domain and subdomains
- `issuewild`: Authorize CA for wildcard certificates
- `iodef`: Email for certificate issue violations
**Examples:**
**Let's Encrypt only:**
```
example.com. 3600 IN CAA 0 issue "letsencrypt.org"
example.com. 3600 IN CAA 0 issuewild "letsencrypt.org"
```
**Multiple CAs:**
```
example.com. 3600 IN CAA 0 issue "letsencrypt.org"
example.com. 3600 IN CAA 0 issue "digicert.com"
```
**No certificates allowed:**
```
example.com. 3600 IN CAA 0 issue ";"
```
**With notification:**
```
example.com. 3600 IN CAA 0 issue "letsencrypt.org"
example.com. 3600 IN CAA 0 iodef "mailto:[email protected]"
```
**Best Practices:**
- Always add CAA records for security
- Include both `issue` and `issuewild` tags
- Add `iodef` for violation notifications
- Test with CA before enforcing strict policy
**TTL Recommendations:**
- 3600-86400s (1-24 hours)
---
### DNSSEC Records
**Purpose:** Cryptographic signatures for DNS data integrity
**Record Types:**
- **DNSKEY**: Public signing key
- **RRSIG**: Signature for record set
- **DS**: Delegation signer (at parent zone)
- **NSEC/NSEC3**: Authenticated denial of existence
**When to Use:**
- Prevent DNS cache poisoning
- Authenticate DNS responses
- Required for high-security environments
**Best Practices:**
- Use provider-managed DNSSEC (complex to manage manually)
- Enable at both registrar and DNS provider
- Monitor for key rotation
- Test thoroughly before enabling
**Note:** DNSSEC is typically managed by DNS providers automatically. Manual configuration is complex and error-prone.
---
## Cloud-Specific Records
### ALIAS Record (Route53, Cloudflare, DNS Made Easy)
**Purpose:** CNAME-like record that works at zone apex
**Provider Support:**
- AWS Route53: ALIAS record
- Cloudflare: CNAME flattening
- DNS Made Easy: ANAME record
- NS1: ALIAS record
**Example (Route53):**
```hcl
resource "aws_route53_record" "apex" {
zone_id = aws_route53_zone.main.zone_id
name = "example.com"
type = "A"
alias {
name = aws_lb.main.dns_name
zone_id = aws_lb.main.zone_id
evaluate_target_health = true
}
}
```
**When to Use:**
- Point zone apex to CDN (CloudFront, Cloudflare)
- Point zone apex to load balancer
- Alternative to A record when IP may change
**Benefits:**
- Works at zone apex (unlike CNAME)
- Automatically updated when target changes
- No additional charge for queries (Route53)
---
## Record Type Selection Decision Tree
### Complete Decision Flow
```
What are you trying to configure?
1. Point domain to server IP
├─ IPv4 address → A record
├─ IPv6 address → AAAA record
└─ Both → A + AAAA records (dual-stack)
2. Point domain to another domain
├─ Subdomain (www, blog, api)
│ └─ CNAME record
└─ Zone apex (@, example.com)
├─ Provider supports ALIAS → ALIAS record
└─ Provider doesn't support ALIAS → A record (use IP)
3. Configure email
├─ Mail servers → MX record
├─ Sender authentication
│ ├─ SPF → TXT record at @
│ ├─ DKIM → TXT record at selector._domainkey
│ └─ DMARC → TXT record at _dmarc
└─ Email forwarding → MX + A records
4. Service discovery
└─ Service location (SIP, XMPP, LDAP) → SRV record
5. Domain verification
└─ Verification token → TXT record
6. Certificate management
└─ Restrict certificate issuance → CAA record
7. Subdomain delegation
└─ Different nameservers → NS record
8. Security
├─ DNS integrity → DNSSEC (DNSKEY, RRSIG, DS)
└─ Certificate control → CAA record
```
### Common Use Cases
**Static Website:**
```
example.com. 3600 IN A 192.0.2.1
www.example.com. 3600 IN CNAME example.com.
```
**Website + Email (Google Workspace):**
```
example.com. 3600 IN A 192.0.2.1
www.example.com. 3600 IN CNAME example.com.
example.com. 3600 IN MX 1 aspmx.l.google.com.
example.com. 3600 IN TXT "v=spf1 include:_spf.google.com ~all"
```
**CDN (CloudFront):**
```
example.com. 3600 IN ALIAS d111111abcdef8.cloudfront.net.
www.example.com. 3600 IN CNAME d111111abcdef8.cloudfront.net.
```
**Load Balanced Application:**
```
example.com. 300 IN A 192.0.2.1
example.com. 300 IN A 192.0.2.2
example.com. 300 IN A 192.0.2.3
```
**Subdomain Delegation:**
```
api.example.com. 86400 IN NS ns1.apihost.com.
api.example.com. 86400 IN NS ns2.apihost.com.
```
---
## Quick Reference Table
| Record Type | Zone Apex | Subdomain | Multiple | TTL Recommendation |
|-------------|-----------|-----------|----------|-------------------|
| A | ✅ | ✅ | ✅ | 3600s |
| AAAA | ✅ | ✅ | ✅ | 3600s |
| CNAME | ❌ | ✅ | ❌ | 3600-86400s |
| ALIAS | ✅ | ✅ | ❌ | Auto |
| MX | ✅ | ✅ | ✅ | 3600-86400s |
| TXT | ✅ | ✅ | ✅ | 3600s |
| SRV | ❌ | ✅ | ✅ | 3600-86400s |
| NS | ✅ | ✅ | ✅ | 86400s |
| CAA | ✅ | ✅ | ✅ | 3600-86400s |
**Legend:**
- ✅ Allowed
- ❌ Not allowed
- Auto: Managed by provider
```
### references/ttl-strategies.md
```markdown
# TTL Strategies - Detailed Reference
Complete guide to Time-To-Live (TTL) strategies for DNS records with scenarios, calculations, and best practices.
## Table of Contents
1. [TTL Fundamentals](#ttl-fundamentals)
2. [TTL by Scenario](#ttl-by-scenario)
3. [Propagation Calculations](#propagation-calculations)
4. [Change Management Strategies](#change-management-strategies)
5. [TTL by Record Type](#ttl-by-record-type)
6. [Common Mistakes](#common-mistakes)
---
## TTL Fundamentals
### What is TTL?
Time-To-Live (TTL) is the duration (in seconds) that DNS resolvers cache a DNS record before querying the authoritative nameserver again.
**Key Concepts:**
- **Lower TTL** = Faster propagation but more DNS queries (higher load)
- **Higher TTL** = Slower propagation but fewer DNS queries (lower load, faster responses)
- **Old TTL matters** = When changing a record, the old TTL must expire first
### TTL Trade-offs
| Aspect | Low TTL (60-300s) | High TTL (3600-86400s) |
|--------|-------------------|------------------------|
| **Propagation Speed** | Fast (minutes) | Slow (hours) |
| **DNS Query Load** | High | Low |
| **DNS Costs** | Higher (more queries) | Lower |
| **Resolution Speed** | Slower (more lookups) | Faster (cached) |
| **Flexibility** | High (quick changes) | Low (slow changes) |
| **Use Case** | Dynamic, failover | Stable, production |
### TTL in DNS Resolution
```
User Request → Local Resolver
├─ Cache Hit (within TTL) → Return cached result
└─ Cache Miss (expired TTL) → Query authoritative server
├─ Get new record + TTL
└─ Cache for TTL duration
```
**Example Timeline:**
```
T+0s: Record created with TTL 3600s
T+100s: First query → Cached for 3600s
T+200s: Second query → Returns cached result (3500s remaining)
T+3700s: Third query → TTL expired, queries authoritative again
```
---
## TTL by Scenario
### Scenario 1: Normal Operation (Stable Infrastructure)
**Goal:** Minimize DNS queries, optimize performance
**Recommended TTL Values:**
```
A/AAAA records: 3600s (1 hour)
CNAME records: 3600-86400s (1-24 hours)
MX records: 3600-86400s (1-24 hours)
TXT records: 3600-86400s (1-24 hours)
NS records: 86400s (24 hours)
CAA records: 3600-86400s (1-24 hours)
```
**Rationale:**
- Servers are stable and IP addresses rarely change
- Reduces load on authoritative DNS servers
- Improves resolution time for cached queries
- Balances flexibility with efficiency
**Example Configuration:**
```
# Zone file
example.com. 3600 IN A 192.0.2.1
www.example.com. 3600 IN CNAME example.com.
example.com. 86400 IN MX 10 mail.example.com.
example.com. 3600 IN TXT "v=spf1 include:_spf.google.com ~all"
example.com. 86400 IN NS ns1.example.com.
```
---
### Scenario 2: Pre-Change Preparation
**Goal:** Enable fast propagation for upcoming DNS changes
**Timeline:**
```
T-48h: Lower TTL to 300s (5 minutes)
T-24h: Verify TTL has propagated globally
└─ Check: dig example.com | grep -A1 "ANSWER"
T-0h: Make the DNS change
T+1h: Verify new records propagating
└─ Check multiple resolvers
T+6h: Verify global propagation
└─ Use whatsmydns.net
T+24h: Raise TTL back to normal (3600s)
```
**Commands:**
```bash
# T-48h: Lower TTL
# (Update via DNS provider or IaC)
# T-24h: Verify TTL propagated
dig example.com | grep -A1 "ANSWER SECTION"
# Should show: example.com. 300 IN A 192.0.2.1
# T+1h: Check propagation
dig @8.8.8.8 example.com +short # Google DNS
dig @1.1.1.1 example.com +short # Cloudflare DNS
dig @208.67.222.222 example.com +short # OpenDNS
# T+6h: Global check
# Visit: https://www.whatsmydns.net/#A/example.com
```
**Why 48 Hours?**
- Old TTL (3600s = 1 hour) must expire
- Safety margin for global propagation
- Allows time to verify new TTL active
---
### Scenario 3: Blue-Green Deployment
**Goal:** Switch traffic quickly between environments
**Strategy:**
```
Phase 1: Preparation (T-48h)
├─ Current: Blue environment (192.0.2.1)
├─ Action: Lower TTL to 300s
└─ Verify: TTL propagated
Phase 2: Deployment (T-0h)
├─ Deploy: Green environment (192.0.2.2)
├─ Test: Validate green environment ready
└─ Monitor: Keep blue running
Phase 3: Cutover (T+0h)
├─ Update: Change DNS to green (192.0.2.2)
├─ Monitor: Traffic shifting to green
└─ Timeline: Full cutover in ~10 minutes (300s TTL + buffer)
Phase 4: Verification (T+30m)
├─ Verify: All traffic on green
├─ Monitor: Error rates, performance
└─ Keep: Blue environment running (rollback ready)
Phase 5: Stabilization (T+24h)
├─ Decommission: Blue environment (if stable)
├─ Raise TTL: Back to 1800s (30 min) for 24h
└─ Final raise: Back to 3600s (1 hour) after 48h
```
**Example:**
```bash
# T-48h: Lower TTL
# Blue environment
example.com. 300 IN A 192.0.2.1
# T-0h: Switch to green
example.com. 300 IN A 192.0.2.2
# Verify propagation
dig @8.8.8.8 example.com +short # Should show 192.0.2.2
dig @1.1.1.1 example.com +short # Should show 192.0.2.2
# T+24h: Gradual TTL increase
example.com. 1800 IN A 192.0.2.2
# T+48h: Back to normal
example.com. 3600 IN A 192.0.2.2
```
**Rollback Process:**
```bash
# If issues detected within 30 minutes
# Simply change DNS back to blue
example.com. 300 IN A 192.0.2.1
# Traffic will shift back in ~5 minutes
```
---
### Scenario 4: DNS-Based Failover
**Goal:** Fastest possible failover to backup systems
**Configuration:**
```
Primary: 192.0.2.1 (health checked every 30s)
Secondary: 192.0.2.2 (standby)
TTL: 60-300s (1-5 minutes)
```
**Recommended TTL:**
- **Active-active**: 60-120s (fast failover, both active)
- **Active-passive**: 120-300s (moderate failover, standby ready)
- **Health check interval**: 30-60s
- **Failure threshold**: 2-3 consecutive failures
**Example (Route53):**
```hcl
# Health check configuration
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3 # 3 failures = unhealthy
request_interval = 30 # Check every 30 seconds
}
# Primary record with health check
resource "aws_route53_record" "primary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60 # Fast failover
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
records = ["192.0.2.1"]
}
# Secondary record (no health check needed)
resource "aws_route53_record" "secondary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "secondary"
failover_routing_policy {
type = "SECONDARY"
}
records = ["192.0.2.2"]
}
```
**Failover Timeline:**
```
T+0s: Primary server fails
T+30s: First health check failure
T+60s: Second health check failure
T+90s: Third health check failure → Route53 marks unhealthy
T+90s: New DNS queries return secondary IP
T+150s: All clients using secondary (90s detection + 60s TTL)
Total failover time: ~2.5 minutes
```
**Optimizing Failover Time:**
- **Lower TTL**: 60s instead of 300s (saves 4 minutes)
- **Faster health checks**: 10s interval (Route53 premium)
- **Lower failure threshold**: 2 failures instead of 3 (saves 30s)
**Trade-off:**
- Faster failover = More DNS queries + higher costs
- Recommended minimum: 60s TTL for production
---
### Scenario 5: Canary/Weighted Routing
**Goal:** Gradually shift traffic to new version
**Strategy:**
```
Phase 1: Initial canary (10%)
├─ Old version: Weight 90, TTL 60s
└─ New version: Weight 10, TTL 60s
Phase 2: Expand canary (30%)
├─ Old version: Weight 70, TTL 60s
└─ New version: Weight 30, TTL 60s
Phase 3: Majority canary (70%)
├─ Old version: Weight 30, TTL 60s
└─ New version: Weight 70, TTL 60s
Phase 4: Full cutover (100%)
├─ Remove: Old version record
└─ New version: Weight 100, TTL 300s
Phase 5: Stabilize
└─ New version: Standard TTL 3600s
```
**Example (Route53):**
```hcl
# 90% to stable version
resource "aws_route53_record" "stable" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "stable"
weighted_routing_policy {
weight = 90
}
records = ["192.0.2.1"]
}
# 10% to canary version
resource "aws_route53_record" "canary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "canary"
weighted_routing_policy {
weight = 10
}
records = ["192.0.2.2"]
}
```
**TTL Considerations:**
- **During canary**: 60-120s (adjust weights quickly)
- **After full cutover**: 300s (moderate)
- **Stable state**: 3600s (normal)
---
### Scenario 6: Dynamic DNS (DDNS)
**Goal:** Keep DNS updated with changing IP addresses
**Recommended TTL:** 30-60 minutes (half of DHCP lease time)
**Configuration:**
```
DHCP Lease: 60 minutes
DNS TTL: 30 minutes
Update Frequency: Every 15-30 minutes
```
**Rationale:**
- TTL shorter than DHCP lease prevents stale records
- Update frequency ensures IP change captured
- Not too short to avoid excessive DNS queries
**Example (ddclient configuration):**
```
# /etc/ddclient.conf
daemon=1800 # Update every 30 minutes
protocol=cloudflare
zone=example.com
ttl=1800 # 30 minutes
login=token
password=your-api-token
example.com
```
**Common DDNS Services:**
- Cloudflare: Supports DDNS via API
- Route53: Supported via route53-ddns
- No-IP, DynDNS: Dedicated DDNS services
- Router built-in: Many routers support DDNS
---
### Scenario 7: GeoDNS / Latency-Based Routing
**Goal:** Route users to nearest/fastest endpoint
**Recommended TTL:** 300-900s (5-15 minutes)
**Rationale:**
- Moderate TTL balances performance and flexibility
- Allows rebalancing if endpoint performance changes
- Not too low (users don't move that fast)
- Not too high (allows for outage response)
**Example (Route53):**
```hcl
# US endpoint
resource "aws_route53_record" "us" {
zone_id = aws_route53_zone.main.zone_id
name = "app.example.com"
type = "A"
ttl = 300
set_identifier = "us-east-1"
latency_routing_policy {
region = "us-east-1"
}
records = ["192.0.2.1"]
}
# EU endpoint
resource "aws_route53_record" "eu" {
zone_id = aws_route53_zone.main.zone_id
name = "app.example.com"
type = "A"
ttl = 300
set_identifier = "eu-west-1"
latency_routing_policy {
region = "eu-west-1"
}
records = ["192.0.2.10"]
}
```
---
## Propagation Calculations
### Formula
```
Maximum Propagation Time = Old TTL + New TTL + DNS Query Time
```
**Components:**
- **Old TTL**: Cached records must expire (worst case: full TTL)
- **New TTL**: New records may be cached immediately
- **Query Time**: Usually negligible (<1s) but add buffer
### Examples
**Example 1: Changing stable record**
```
Old TTL: 3600s (1 hour)
New TTL: 3600s (1 hour)
Query: ~5s
────────────────────────────
Max Time: ~2 hours 5 seconds
Timeline:
T+0: Change made
T+3600: Old TTL expires, new records queried
T+7200: All caches have new record
```
**Example 2: Pre-lowered TTL change**
```
Old TTL: 300s (5 minutes)
New TTL: 300s (5 minutes)
Query: ~5s
────────────────────────────
Max Time: ~10 minutes 5 seconds
Timeline:
T+0: Change made
T+300: Old TTL expires, new records queried
T+600: All caches have new record
```
**Example 3: Emergency change (high TTL)**
```
Old TTL: 86400s (24 hours)
New TTL: 300s (5 minutes)
Query: ~5s
────────────────────────────
Max Time: ~24 hours 5 minutes
Timeline:
T+0: Emergency change made
T+86400: Old TTL expires (worst case)
T+86700: All caches have new record
```
### Propagation Tables
| Old TTL | New TTL | Max Propagation | Typical Scenario |
|---------|---------|----------------|------------------|
| 60s | 60s | ~2 min | Failover (fast) |
| 300s | 300s | ~10 min | Pre-lowered change |
| 3600s | 300s | ~1h 5min | Emergency with high TTL |
| 3600s | 3600s | ~2h | Normal change |
| 86400s | 300s | ~24h 5min | Emergency without prep |
**Key Insight:** Old TTL matters most! Always plan ahead by lowering TTL before changes.
---
## Change Management Strategies
### Strategy 1: Planned Change (Recommended)
```
Step 1 (T-48h): Lower TTL to 300s
├─ Update DNS records with TTL 300s
├─ Verify change applied
└─ Wait 48 hours for global propagation
Step 2 (T-24h): Verify low TTL active
├─ Check: dig example.com | grep TTL
├─ Expected: 300s or less
└─ Proceed if verified
Step 3 (T-0h): Make DNS change
├─ Update A/CNAME/etc records
├─ Keep TTL at 300s
└─ Monitor propagation
Step 4 (T+6h): Verify global propagation
├─ Check multiple resolvers
├─ Use whatsmydns.net
└─ Confirm 100% propagation
Step 5 (T+24h): Raise TTL (gradual)
├─ Raise to 1800s (30 min)
└─ Monitor for 24h
Step 6 (T+48h): Restore normal TTL
├─ Raise to 3600s (1 hour)
└─ Normal operation resumed
```
**Pros:**
- Predictable propagation (~10 minutes)
- Low risk of extended outages
- Can plan during maintenance window
**Cons:**
- Requires 48h+ planning
- Increased DNS queries during low TTL period
---
### Strategy 2: Emergency Change (Unplanned)
```
Step 1 (T-0h): Make change immediately
├─ Update DNS with lowest possible TTL (60-300s)
├─ Accept propagation will take old TTL duration
└─ Monitor closely
Step 2 (T+0h to T+old_TTL): Monitor propagation
├─ Check multiple resolvers
├─ Communicate expected propagation time
└─ Wait for old TTL to expire
Step 3 (T+old_TTL): Verify completion
├─ Confirm global propagation
└─ Service restored
Step 4 (T+24h): Normalize TTL
├─ Gradually raise TTL back to normal
└─ Document incident
```
**Example Timeline (Old TTL = 3600s):**
```
T+0: Emergency change made
T+1h: Old TTL expires, some users see new records
T+2h: Most users on new records
T+3h: Virtually all users migrated
T+24h: Raise TTL back to normal
```
**Pros:**
- Immediate action possible
- No pre-planning required
**Cons:**
- Extended propagation time
- Unpredictable user experience during transition
---
### Strategy 3: Hybrid (Best of Both)
```
Default State:
├─ Critical records: TTL 300-600s (always low)
├─ Standard records: TTL 3600s (normal)
└─ Stable records: TTL 86400s (very stable)
Change Process:
├─ Critical records: Change immediately (~10 min propagation)
├─ Standard records: Use planned strategy (48h prep)
└─ Stable records: Plan weeks in advance
```
**Example Classification:**
```
Critical (TTL 300s):
- API endpoints
- Load balancer records
- Failover targets
Standard (TTL 3600s):
- Website A records
- CDN CNAMEs
- Mail server A records
Stable (TTL 86400s):
- NS records
- Static subdomains
- Long-term CNAMEs
```
---
## TTL by Record Type
### Recommended TTL Values
| Record Type | Normal Operation | Before Change | Rationale |
|-------------|------------------|---------------|-----------|
| **A/AAAA** | 3600s (1h) | 300s (5min) | Balance flexibility and performance |
| **CNAME** | 3600-86400s | 300s | Usually stable, point to stable targets |
| **MX** | 3600-86400s | 300s | Mail servers rarely change |
| **TXT (SPF/DKIM)** | 3600-86400s | 3600s | Email auth rarely changes |
| **TXT (verification)** | 3600s | N/A | Can remove after verification |
| **SRV** | 3600-86400s | 300s | Services relatively stable |
| **NS** | 86400-172800s | 86400s | Very stable, rarely change |
| **CAA** | 3600-86400s | 3600s | Certificate policy stable |
| **PTR (reverse)** | 86400s | 86400s | Rarely changes |
### Special Considerations
**MX Records:**
- Higher TTL acceptable (3600-86400s)
- Email delivery retries automatically
- Rarely need emergency changes
**NS Records:**
- Highest TTL (86400-172800s = 1-2 days)
- Changes are very rare
- Plan changes weeks in advance
- Update both parent and child zone
**TXT Records:**
- Verification records: 3600s (can remove after verification)
- SPF/DKIM/DMARC: 3600-86400s (rarely change)
- API keys/tokens: 3600s (moderate flexibility)
---
## Common Mistakes
### Mistake 1: TTL Too High Before Changes
**Problem:**
```bash
# Day 1: Record with 24-hour TTL
example.com. 86400 IN A 192.0.2.1
# Day 2: Emergency - need to change IP
# Problem: Must wait 24 hours for full propagation!
```
**Solution:**
- Maintain moderate TTL (3600s) for production records
- Lower TTL 48h before planned changes
- Keep critical records at lower TTL (300s) always
---
### Mistake 2: Setting TTL to 0
**Problem:**
```bash
# ❌ WRONG - TTL of 0
example.com. 0 IN A 192.0.2.1
```
**Why It's Bad:**
- Resolvers ignore TTL of 0 (use minimum instead)
- Causes excessive queries to authoritative servers
- No performance benefit
- May be treated as error by some resolvers
**Solution:**
- Minimum TTL: 60s (1 minute)
- Recommended minimum: 300s (5 minutes)
---
### Mistake 3: Not Waiting for Old TTL
**Problem:**
```bash
# T+0h: Change DNS
# T+5min: "Why isn't it working?!"
# Old TTL was 3600s - must wait 1 hour!
```
**Solution:**
- Check current TTL before making changes
- Communicate expected propagation time
- Use propagation formula: Old TTL + New TTL
---
### Mistake 4: Forgetting to Raise TTL After Change
**Problem:**
```bash
# After emergency change, TTL left at 60s
# Result: Excessive DNS queries forever
# Higher costs, slower performance
```
**Solution:**
- Schedule TTL normalization 24-48h after change
- Document TTL changes in change management
- Use infrastructure-as-code to enforce standards
---
### Mistake 5: Inconsistent TTL Across Records
**Problem:**
```bash
example.com. 3600 IN A 192.0.2.1
www.example.com. 60 IN CNAME example.com.
# CNAME refreshes every minute, but points to record cached for 1 hour
```
**Solution:**
- Keep related records at similar TTL
- A record and CNAME should have similar TTL
- Coordinate TTL changes across record sets
---
## TTL Best Practices Summary
### Golden Rules
1. **Default to 3600s (1 hour)** for most production records
2. **Lower to 300s (5 min)** 48 hours before planned changes
3. **Keep NS records high** (86400s / 24 hours)
4. **Never use 0** (minimum 60s, recommended 300s)
5. **Raise TTL after changes** (don't leave at low values)
6. **Monitor DNS query costs** (lower TTL = higher costs)
7. **Document TTL strategy** in runbooks
### Quick Decision Matrix
```
How often does this change?
├─ Never → 86400s (24h)
├─ Rarely (years) → 86400s (24h)
├─ Occasionally (months) → 3600s (1h)
├─ Regularly (weeks) → 1800s (30min)
├─ Frequently (days) → 300-600s (5-10min)
└─ Very frequently (hours) → 60-300s (1-5min)
Is this critical for failover?
├─ Yes → 60-300s (1-5min)
└─ No → Use frequency-based rule above
Are DNS costs a concern?
├─ Yes → Use higher TTLs (3600-86400s)
└─ No → Optimize for flexibility (300-3600s)
```
### Monitoring TTL Effectiveness
**Metrics to Track:**
- DNS query volume (lower TTL = higher volume)
- DNS resolution time (higher TTL = faster, more cached)
- Propagation time during changes (lower TTL = faster)
- DNS costs (provider-specific)
**Adjust TTL if:**
- Query volume too high → Raise TTL
- Changes take too long → Lower TTL (or pre-plan)
- Costs too high → Raise TTL
- Failover too slow → Lower TTL for critical records
```
### references/dns-as-code-comparison.md
```markdown
# DNS-as-Code Tools Comparison
Comprehensive comparison of DNS automation tools with recommendations, examples, and decision frameworks.
## Table of Contents
1. [Tool Overview](#tool-overview)
2. [ExternalDNS](#externaldns)
3. [OctoDNS](#octodns)
4. [DNSControl](#dnscontrol)
5. [Terraform](#terraform)
6. [Comparison Matrix](#comparison-matrix)
7. [Selection Guide](#selection-guide)
---
## Tool Overview
### What is DNS-as-Code?
DNS-as-Code treats DNS records as declarative configuration files, enabling:
- Version control for DNS changes (Git)
- Code review for DNS updates
- Automated deployment pipelines
- Consistency across environments
- Rollback capabilities
### Tool Categories
**1. Kubernetes-Native:** external-dns
- Watches Kubernetes resources
- Automatically syncs to DNS providers
- Annotation-based configuration
**2. Multi-Provider Sync:** OctoDNS, DNSControl
- Define DNS in configuration files
- Sync to multiple providers simultaneously
- Provider-agnostic abstraction
**3. Infrastructure-as-Code:** Terraform, Pulumi
- Manage DNS alongside other infrastructure
- Provider-specific resources
- State management
---
## ExternalDNS
### Overview
Kubernetes controller that synchronizes Service and Ingress resources with DNS providers.
**Repository:** `/kubernetes-sigs/external-dns`
**Language:** Go
**License:** Apache 2.0
**Maturity:** Production-ready (CNCF project)
**Trust Indicators:**
- Context7 Code Snippets: 671+
- GitHub Stars: 7k+
- Active maintenance: Kubernetes SIG project
- Production use: Thousands of clusters
### Key Features
**Automatic Sync:**
- Watches Kubernetes Services (LoadBalancer, NodePort)
- Watches Ingress resources
- Creates/updates/deletes DNS records automatically
- No manual DNS updates required
**Annotation-Based:**
```yaml
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: app.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
```
**Provider Support (20+):**
- AWS Route53
- Google Cloud DNS
- Azure DNS
- Cloudflare
- DigitalOcean
- Linode
- OVH
- RFC2136 (BIND, PowerDNS)
- Many more...
### Strengths
✅ **Kubernetes-native** - Seamless integration
✅ **Automatic** - No manual DNS updates
✅ **Annotation-based** - Simple configuration
✅ **Multi-provider** - 20+ supported providers
✅ **Production-ready** - Widely used, stable
### Limitations
❌ **Kubernetes-only** - Cannot manage non-K8s DNS
❌ **Limited logic** - Cannot express complex rules
❌ **Single cluster** - Each cluster needs own external-dns
❌ **No preview mode** - Changes applied directly (policy=sync)
### When to Use
Use external-dns when:
- Running Kubernetes workloads
- Want automatic DNS for Services/Ingresses
- Need zero manual DNS management
- GitOps workflow (ArgoCD, Flux)
Skip external-dns when:
- Not using Kubernetes
- Need complex DNS logic
- Managing DNS across multiple non-K8s systems
### Configuration Example
**Helm Installation:**
```bash
helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/
helm repo update
helm install external-dns external-dns/external-dns \
--namespace external-dns \
--create-namespace \
--set provider=aws \
--set aws.region=us-east-1 \
--set txtOwnerId=my-k8s-cluster \
--set domainFilters[0]=example.com \
--set policy=sync \
--set registry=txt
```
**Service with Annotation:**
```yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
annotations:
external-dns.alpha.kubernetes.io/hostname: nginx.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
selector:
app: nginx
ports:
- port: 80
targetPort: 80
```
**Result:**
```
# DNS record automatically created:
nginx.example.com. 300 IN A <LoadBalancer-IP>
```
### Best Practices
1. **Use TXT registry** - Tracks ownership
2. **Set domain filter** - Prevent accidental changes to other domains
3. **Use policy=sync** - Allows deletions when resources removed
4. **One external-dns per cluster** - Avoid conflicts
5. **Monitor logs** - Watch for errors/rate limits
### Common Issues
**Issue: Records not created**
```bash
# Check logs
kubectl logs -n external-dns deployment/external-dns
# Verify annotation
kubectl get service nginx -o yaml | grep external-dns
# Check domain filter
kubectl describe deployment -n external-dns external-dns | grep domain-filter
```
**Issue: Permission denied**
```bash
# AWS: Check IAM policy
aws iam get-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/external-dns \
--version-id v1
# GCP: Check service account
gcloud projects get-iam-policy PROJECT_ID \
--flatten="bindings[].members" \
--filter="bindings.members:external-dns@"
```
---
## OctoDNS
### Overview
Python-based DNS-as-code tool that syncs DNS records from YAML configuration files to multiple providers.
**Repository:** `/octodns/octodns`
**Language:** Python
**License:** MIT
**Maturity:** Production-ready
**Trust Indicators:**
- Context7 Code Snippets: 128+
- Context7 Benchmark Score: 88.2/100
- Source Reputation: High
- Used by: GitHub (internally)
### Key Features
**YAML Configuration:**
- Define DNS zones in YAML files
- Version control with Git
- Human-readable format
**Multi-Provider Sync:**
- Sync same zone to multiple providers
- Provider-agnostic abstraction
- Automatic conflict resolution
**Preview Mode:**
```bash
# Dry run - see what would change
octodns-sync --config-file=config.yaml
# Apply changes
octodns-sync --config-file=config.yaml --doit
```
**Provider Support (15+):**
- AWS Route53
- Google Cloud DNS
- Azure DNS
- Cloudflare
- DigitalOcean
- NS1
- Dyn
- RFC2136 (BIND)
- And more...
### Strengths
✅ **Multi-provider** - Sync to multiple providers simultaneously
✅ **YAML-based** - Easy to read and write
✅ **Git-friendly** - Perfect for version control
✅ **Preview mode** - See changes before applying
✅ **Provider abstraction** - Write once, deploy anywhere
### Limitations
❌ **Python dependency** - Requires Python 3.7+
❌ **Manual sync** - Must run octodns-sync command
❌ **Complex setup** - More setup than external-dns
❌ **Learning curve** - YAML schema to learn
### When to Use
Use OctoDNS when:
- Managing DNS across multiple providers
- Need version control for DNS records
- Want preview mode before applying changes
- Migrating between DNS providers
- Multi-environment DNS (dev, staging, prod)
Skip OctoDNS when:
- Only using one provider (Terraform may be simpler)
- Prefer JavaScript over Python/YAML
- Need Kubernetes automation (use external-dns)
### Configuration Example
**Main Config (`octodns-config.yaml`):**
```yaml
---
providers:
config:
class: octodns.provider.yaml.YamlProvider
directory: ./config
default_ttl: 3600
route53:
class: octodns_route53.Route53Provider
access_key_id: env/AWS_ACCESS_KEY_ID
secret_access_key: env/AWS_SECRET_ACCESS_KEY
cloudflare:
class: octodns_cloudflare.CloudflareProvider
token: env/CLOUDFLARE_TOKEN
zones:
example.com.:
sources:
- config
targets:
- route53
- cloudflare
```
**Zone Records (`config/example.com.yaml`):**
```yaml
---
# Root domain
'':
- type: A
ttl: 300
values:
- 192.0.2.1
- 192.0.2.2
- type: MX
ttl: 3600
values:
- exchange: mail1.example.com.
preference: 10
- exchange: mail2.example.com.
preference: 20
# Subdomains
www:
type: CNAME
ttl: 3600
value: example.com.
api:
type: A
ttl: 300
values:
- 192.0.2.10
'_dmarc':
type: TXT
ttl: 3600
value: "v=DMARC1; p=quarantine; rua=mailto:[email protected]"
```
**Usage:**
```bash
# Install
pip install octodns octodns-route53 octodns-cloudflare
# Validate configuration
octodns-validate --config-file=octodns-config.yaml
# Preview changes (dry run)
octodns-sync --config-file=octodns-config.yaml
# Apply changes
octodns-sync --config-file=octodns-config.yaml --doit
# Sync specific zone
octodns-sync --config-file=octodns-config.yaml --doit example.com
```
### Best Practices
1. **Version control** - Commit config to Git
2. **CI/CD integration** - Automate sync on merge
3. **Preview first** - Always dry-run before --doit
4. **Environment variables** - Store credentials securely
5. **Separate environments** - Different configs for dev/staging/prod
### Common Issues
**Issue: Provider authentication failed**
```bash
# Check environment variables
echo $AWS_ACCESS_KEY_ID
echo $CLOUDFLARE_TOKEN
# Test provider access
octodns-validate --config-file=octodns-config.yaml
```
**Issue: Sync conflicts**
```bash
# OctoDNS detects manual changes in provider
# Solution: Sync from provider to config first
octodns-dump --config-file=octodns-config.yaml --output-dir=./backup
# Or force sync (overwrite provider)
octodns-sync --config-file=octodns-config.yaml --doit --force
```
---
## DNSControl
### Overview
JavaScript-based DNS-as-code tool with expressive DSL and multi-provider support.
**Repository:** `/stackexchange/dnscontrol`
**Language:** Go (runtime), JavaScript (config)
**License:** MIT
**Maturity:** Production-ready
**Trust Indicators:**
- Context7 Code Snippets: 649+
- Source Reputation: High
- Used by: StackExchange (at scale)
- GitHub Stars: 3k+
### Key Features
**JavaScript DSL:**
- Expressive configuration language
- Variables, functions, loops
- Reusable code patterns
**Preview Mode:**
```bash
# See what would change
dnscontrol preview
# Apply changes
dnscontrol push
```
**Provider Support (30+):**
- Largest provider support among DNS tools
- AWS Route53, Cloud DNS, Azure DNS, Cloudflare
- Many niche providers
- Custom providers via Go plugins
### Strengths
✅ **JavaScript DSL** - Familiar syntax, powerful logic
✅ **Most providers** - 30+ supported
✅ **Preview mode** - Safe dry-run
✅ **Helper functions** - DRY configuration
✅ **Large community** - Active development
### Limitations
❌ **Go runtime required** - Binary installation needed
❌ **JavaScript only** - Not YAML (preference)
❌ **Learning curve** - DSL to learn
❌ **Less Kubernetes-friendly** - Manual integration needed
### When to Use
Use DNSControl when:
- Comfortable with JavaScript
- Need complex DNS logic (functions, variables)
- Managing many similar domains
- Want most provider options
- Prefer expressive DSL over YAML
Skip DNSControl when:
- Prefer YAML configuration (use OctoDNS)
- Don't want JavaScript dependency
- Need Kubernetes automation (use external-dns)
- Want simpler setup (use Terraform)
### Configuration Example
**Main Config (`dnsconfig.js`):**
```javascript
var REG_NONE = NewRegistrar("none");
var DNS_CLOUDFLARE = NewDnsProvider("cloudflare");
var DNS_ROUTE53 = NewDnsProvider("route53");
// Helper function for standard web setup
function StandardWeb(domain, ip) {
return [
A("@", ip, TTL(300)),
A("www", ip, TTL(300)),
CNAME("blog", domain + "."),
];
}
// Helper function for Google Workspace email
function GoogleMail(domain) {
return [
MX("@", 1, "aspmx.l.google.com.", TTL(3600)),
MX("@", 5, "alt1.aspmx.l.google.com."),
MX("@", 5, "alt2.aspmx.l.google.com."),
TXT("@", "v=spf1 include:_spf.google.com ~all"),
];
}
// Main domain - synced to both providers
D("example.com", REG_NONE,
DnsProvider(DNS_CLOUDFLARE),
DnsProvider(DNS_ROUTE53),
// Use helper functions
StandardWeb("example.com", "192.0.2.1"),
GoogleMail("example.com"),
// API endpoint
A("api", "192.0.2.10", TTL(300)),
// Certificate verification
CAA("@", "issue", "letsencrypt.org"),
CAA("@", "iodef", "mailto:[email protected]"),
);
// Staging environment
D("staging.example.com", REG_NONE,
DnsProvider(DNS_CLOUDFLARE),
A("@", "192.0.2.100", TTL(300)),
A("*", "192.0.2.100", TTL(300)), // Wildcard
);
```
**Credentials (`creds.json`):**
```json
{
"cloudflare": {
"TYPE": "CLOUDFLAREAPI",
"accountid": "your-account-id",
"apitoken": "your-api-token"
},
"route53": {
"TYPE": "ROUTE53",
"KeyId": "your-key-id",
"SecretKey": "your-secret-key"
}
}
```
**Usage:**
```bash
# Install
# macOS
brew install dnscontrol
# Linux
curl -L https://github.com/StackExchange/dnscontrol/releases/download/v3.x.x/dnscontrol-Linux -o dnscontrol
chmod +x dnscontrol
# Validate configuration
dnscontrol check
# Preview changes
dnscontrol preview
# Apply changes
dnscontrol push
# Push to specific provider
dnscontrol push --providers cloudflare
```
### Best Practices
1. **Use helper functions** - DRY configuration
2. **Variables for IPs** - Easy environment switching
3. **Git version control** - Track changes
4. **CI/CD integration** - Automate deployments
5. **Separate credentials** - Don't commit creds.json
### Advanced Examples
**Using Variables:**
```javascript
var IP_PROD = "192.0.2.1";
var IP_STAGING = "192.0.2.100";
D("example.com", REG_NONE, DnsProvider(DNS_CLOUDFLARE),
A("@", IP_PROD),
A("www", IP_PROD),
);
D("staging.example.com", REG_NONE, DnsProvider(DNS_CLOUDFLARE),
A("@", IP_STAGING),
A("www", IP_STAGING),
);
```
**Loop for Subdomains:**
```javascript
var subdomains = ["app1", "app2", "app3"];
D("example.com", REG_NONE, DnsProvider(DNS_CLOUDFLARE),
...subdomains.map(sub => A(sub, "192.0.2.1", TTL(300))),
);
```
---
## Terraform
### Overview
General-purpose infrastructure-as-code tool with DNS provider support.
**Benefits:**
- Manage DNS alongside other infrastructure
- State management (track changes)
- Plan/apply workflow
- Strong typing
**Providers:**
- aws (Route53)
- google (Cloud DNS)
- azurerm (Azure DNS)
- cloudflare (Cloudflare)
- 100+ other providers
### When to Use Terraform
Use Terraform when:
- Already using Terraform for infrastructure
- Want state management
- Need to manage DNS + compute + storage together
- Prefer HCL over YAML/JavaScript
Skip Terraform when:
- Only managing DNS (OctoDNS/DNSControl lighter)
- Need Kubernetes automation (use external-dns)
- Don't want state file management
### Example
```hcl
# Route53 zone
resource "aws_route53_zone" "main" {
name = "example.com"
}
# A record
resource "aws_route53_record" "www" {
zone_id = aws_route53_zone.main.zone_id
name = "www.example.com"
type = "A"
ttl = 300
records = ["192.0.2.1"]
}
# MX records
resource "aws_route53_record" "mx" {
zone_id = aws_route53_zone.main.zone_id
name = "example.com"
type = "MX"
ttl = 3600
records = [
"10 mail1.example.com",
"20 mail2.example.com",
]
}
```
---
## Comparison Matrix
### Feature Comparison
| Feature | external-dns | OctoDNS | DNSControl | Terraform |
|---------|--------------|---------|------------|-----------|
| **Language** | Go | Python/YAML | JavaScript | HCL |
| **Config Format** | K8s annotations | YAML | JavaScript | HCL |
| **Preview Mode** | ❌ | ✅ | ✅ | ✅ (plan) |
| **Multi-Provider** | ✅ (separate) | ✅ (native) | ✅ (native) | ✅ (modules) |
| **Kubernetes** | ✅ Native | ❌ | ❌ | ⚠️ Possible |
| **Learning Curve** | Low | Medium | Medium | Medium-High |
| **Provider Count** | 20+ | 15+ | 30+ | 100+ |
| **State Management** | K8s objects | No | No | ✅ State file |
| **Version Control** | K8s manifests | YAML files | JS files | HCL files |
| **Automation** | Automatic | Manual sync | Manual push | Manual apply |
### Strengths and Weaknesses
| Tool | Best For | Avoid If |
|------|----------|----------|
| **external-dns** | Kubernetes DNS automation | Not using Kubernetes |
| **OctoDNS** | Multi-provider sync, YAML preference | Prefer JavaScript, complex logic |
| **DNSControl** | Complex logic, many providers | Prefer YAML, simpler setup |
| **Terraform** | Infrastructure + DNS together | Only managing DNS |
---
## Selection Guide
### Decision Flow
```
Are you using Kubernetes?
├─ Yes → external-dns (if only K8s DNS needs)
│ or Terraform (if managing infrastructure too)
└─ No → Continue...
Managing multiple DNS providers?
├─ Yes → OctoDNS or DNSControl
└─ No → Continue...
Already using Terraform?
├─ Yes → Terraform (manage DNS with infrastructure)
└─ No → Continue...
Prefer YAML or JavaScript?
├─ YAML → OctoDNS
└─ JavaScript → DNSControl
```
### By Use Case
**Use Case 1: Kubernetes Services**
```
Best: external-dns
Why: Automatic, annotation-based, Kubernetes-native
Alternative: Terraform + external-dns (for non-K8s DNS)
```
**Use Case 2: Multi-Provider Redundancy**
```
Best: OctoDNS or DNSControl
Why: Native multi-provider sync
Example: Sync same zone to Route53 + Cloudflare
```
**Use Case 3: Infrastructure + DNS**
```
Best: Terraform
Why: Manage compute, network, DNS together
Example: Create EC2 + Route53 record in one apply
```
**Use Case 4: Complex DNS Logic**
```
Best: DNSControl
Why: JavaScript functions, variables, loops
Example: Generate 100 subdomains programmatically
```
**Use Case 5: DNS Migration**
```
Best: OctoDNS
Why: Export from old provider, sync to new
Example: Migrate from GoDaddy to Cloudflare
```
### Combination Strategies
**Strategy 1: Kubernetes + Non-Kubernetes**
```
external-dns: Automatic K8s Service/Ingress DNS
OctoDNS: Static DNS records (MX, TXT, etc.)
```
**Strategy 2: Infrastructure + Dynamic DNS**
```
Terraform: Zones, static records, infrastructure
external-dns: Dynamic K8s workload DNS
```
**Strategy 3: Multi-Environment**
```
DNSControl: Dev and staging zones
Terraform: Production zone (with infrastructure)
```
---
## Summary
### Quick Recommendations
**Choose external-dns if:**
- Running Kubernetes
- Want automatic DNS for Services/Ingresses
- Prefer zero manual management
**Choose OctoDNS if:**
- Managing multiple DNS providers
- Prefer YAML configuration
- Need version control and preview mode
**Choose DNSControl if:**
- Comfortable with JavaScript
- Need complex DNS logic
- Want most provider options
**Choose Terraform if:**
- Already using Terraform
- Managing infrastructure + DNS together
- Want state management
### Can't Decide?
**Start simple:**
1. Kubernetes → external-dns
2. Static DNS → OctoDNS (YAML) or DNSControl (JavaScript)
3. Infrastructure → Terraform
**Grow as needed:**
- Combine tools for different use cases
- Migrate between tools with export/import
- All support Git version control
```
### references/cloud-providers.md
```markdown
# Cloud DNS Providers - Detailed Reference
Complete comparison of major cloud DNS providers with features, pricing, and configuration examples.
## Table of Contents
1. [AWS Route53](#aws-route53)
2. [Google Cloud DNS](#google-cloud-dns)
3. [Azure DNS](#azure-dns)
4. [Cloudflare](#cloudflare)
5. [Provider Comparison Matrix](#provider-comparison-matrix)
6. [Selection Guide](#selection-guide)
---
## AWS Route53
### Overview
AWS Route53 is Amazon's highly available and scalable DNS service with advanced routing policies and tight AWS integration.
**Key Strengths:**
- Advanced routing policies (7 types)
- Health checks with automatic failover
- ALIAS records for AWS resources (free queries)
- Traffic Flow visual policy editor
- Tight integration with AWS services
**Best For:**
- AWS-heavy infrastructure
- Complex routing requirements
- Need for health checks and failover
- Organizations using AWS ecosystem
### Features
**Routing Policies:**
1. **Simple**: Standard DNS routing
2. **Weighted**: Percentage-based traffic distribution
3. **Latency-based**: Route to lowest latency endpoint
4. **Geolocation**: Route based on user location
5. **Geoproximity**: Route based on resource and user location
6. **Failover**: Active-passive failover with health checks
7. **Multivalue**: Return multiple IPs with health checks
**Health Checks:**
- HTTP/HTTPS/TCP endpoint monitoring
- Calculated health checks (combine multiple checks)
- CloudWatch alarm integration
- String matching on response
- Latency measurements
**ALIAS Records:**
- CNAME-like functionality at zone apex
- Free queries (no charge for ALIAS to AWS resources)
- Automatic IP update when target changes
- Supported targets: ELB, CloudFront, S3, API Gateway, VPC endpoints
**DNSSEC:**
- Supported for both domain registration and hosted zones
- Key signing key (KSK) management
- Integration with domain registrars
### Pricing (2025)
**Hosted Zones:**
- $0.50/month per hosted zone
- First 25 zones: $0.50/month each
- Additional zones: pricing decreases with volume
**Queries:**
- Standard queries: $0.40 per million (first 1 billion)
- Latency-based: $0.60 per million
- Geo/Geoproximity: $0.70 per million
- ALIAS queries to AWS resources: Free
**Health Checks:**
- Basic (HTTP/HTTPS/TCP): $0.50/month per endpoint
- Calculated: $1.00/month per health check
- Optional features (HTTPS, string matching, fast interval): Additional cost
**Traffic Flow:**
- $50/month per policy record
- Includes unlimited policy configurations
### Configuration Examples
**Basic A Record:**
```hcl
resource "aws_route53_zone" "main" {
name = "example.com"
}
resource "aws_route53_record" "www" {
zone_id = aws_route53_zone.main.zone_id
name = "www.example.com"
type = "A"
ttl = 300
records = ["192.0.2.1"]
}
```
**ALIAS Record (CloudFront):**
```hcl
resource "aws_route53_record" "apex" {
zone_id = aws_route53_zone.main.zone_id
name = "example.com"
type = "A"
alias {
name = aws_cloudfront_distribution.main.domain_name
zone_id = aws_cloudfront_distribution.main.hosted_zone_id
evaluate_target_health = false
}
}
```
**Weighted Routing (Canary):**
```hcl
# 90% to stable
resource "aws_route53_record" "stable" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "stable"
weighted_routing_policy {
weight = 90
}
records = ["192.0.2.1"]
}
# 10% to canary
resource "aws_route53_record" "canary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "canary"
weighted_routing_policy {
weight = 10
}
records = ["192.0.2.2"]
}
```
**Geolocation Routing:**
```hcl
# North America users
resource "aws_route53_record" "na" {
zone_id = aws_route53_zone.main.zone_id
name = "app.example.com"
type = "A"
ttl = 300
set_identifier = "north-america"
geolocation_routing_policy {
continent = "NA"
}
records = ["192.0.2.1"]
}
# Europe users
resource "aws_route53_record" "eu" {
zone_id = aws_route53_zone.main.zone_id
name = "app.example.com"
type = "A"
ttl = 300
set_identifier = "europe"
geolocation_routing_policy {
continent = "EU"
}
records = ["192.0.2.10"]
}
# Default fallback
resource "aws_route53_record" "default" {
zone_id = aws_route53_zone.main.zone_id
name = "app.example.com"
type = "A"
ttl = 300
set_identifier = "default"
geolocation_routing_policy {
location = "*"
}
records = ["192.0.2.100"]
}
```
**Health Check with Failover:**
```hcl
# Health check
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
tags = {
Name = "primary-health-check"
}
}
# Primary record
resource "aws_route53_record" "primary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
records = ["192.0.2.1"]
}
# Secondary record
resource "aws_route53_record" "secondary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "secondary"
failover_routing_policy {
type = "SECONDARY"
}
records = ["192.0.2.2"]
}
```
### When to Choose Route53
**Choose Route53 if:**
- Already using AWS services (EC2, ELB, CloudFront, S3)
- Need advanced routing policies (geolocation, latency, weighted)
- Require health checks with automatic failover
- Want ALIAS records to AWS resources (free queries)
- Using Terraform or CloudFormation for infrastructure
- Need Traffic Flow for complex routing scenarios
**Skip Route53 if:**
- Not using AWS (no ALIAS benefit)
- Need lowest DNS costs (Cloud DNS is cheaper per zone)
- Want simplest setup (Cloudflare easier for beginners)
- Need fastest global DNS (Cloudflare typically faster)
---
## Google Cloud DNS
### Overview
Google Cloud DNS is a high-performance, scalable DNS service running on Google's infrastructure with strong DNSSEC support.
**Key Strengths:**
- Google's global anycast network
- Strong DNSSEC support with automatic key rotation
- Private DNS zones for VPC internal resolution
- Split-horizon DNS (different answers for internal/external)
- Lowest hosted zone cost among major providers
**Best For:**
- GCP-native applications
- DNSSEC requirements
- Private/internal DNS zones
- Organizations using Google Cloud
### Features
**DNS Zones:**
- Public zones (internet-accessible)
- Private zones (VPC-only, internal DNS)
- Managed zones with automatic backups
- Zone transfers (AXFR) supported
**DNSSEC:**
- One-click enablement
- Automatic key rotation
- Support for custom signing algorithms
- DS record management at registrar
**Routing Policies:**
- Weighted round robin
- Geolocation routing (GeoIP)
- Private zone resolution for VPCs
**Cloud DNS Features:**
- 100% SLA
- Fast global propagation
- Logging to Cloud Logging
- Integration with Cloud Monitoring
### Pricing (2025)
**Hosted Zones:**
- $0.20/month per managed zone (cheapest among major providers)
- First 25 zones: $0.20/month each
**Queries:**
- $0.40 per million queries (first 1 billion/month)
- Reduced pricing for higher volumes
**Note:** Cloud DNS pricing is the lowest for hosted zones among AWS, GCP, and Azure.
### Configuration Examples
**Basic Zone and Records:**
```hcl
resource "google_dns_managed_zone" "main" {
name = "example-com"
dns_name = "example.com."
description = "Production DNS zone"
dnssec_config {
state = "on"
}
}
resource "google_dns_record_set" "a" {
name = "example.com."
managed_zone = google_dns_managed_zone.main.name
type = "A"
ttl = 300
rrdatas = ["192.0.2.1", "192.0.2.2"]
}
resource "google_dns_record_set" "www" {
name = "www.example.com."
managed_zone = google_dns_managed_zone.main.name
type = "CNAME"
ttl = 3600
rrdatas = ["example.com."]
}
```
**Private DNS Zone:**
```hcl
resource "google_dns_managed_zone" "private" {
name = "internal-example"
dns_name = "internal.example.com."
description = "Private DNS zone for VPC"
visibility = "private"
private_visibility_config {
networks {
network_url = google_compute_network.main.id
}
networks {
network_url = google_compute_network.staging.id
}
}
}
resource "google_dns_record_set" "internal_db" {
name = "db.internal.example.com."
managed_zone = google_dns_managed_zone.private.name
type = "A"
ttl = 300
rrdatas = ["10.0.1.10"]
}
```
**Geolocation Routing:**
```hcl
resource "google_dns_record_set" "geo" {
name = "app.example.com."
managed_zone = google_dns_managed_zone.main.name
type = "A"
ttl = 300
routing_policy {
geo {
location = "us-central1"
rrdatas = ["192.0.2.1"]
}
geo {
location = "europe-west1"
rrdatas = ["192.0.2.10"]
}
}
}
```
### When to Choose Cloud DNS
**Choose Cloud DNS if:**
- Using Google Cloud Platform (GKE, GCE, Cloud Run)
- Need DNSSEC with automatic management
- Require private DNS zones for VPC
- Want split-horizon DNS (different internal/external records)
- Need lowest hosted zone cost
- Using Terraform for GCP infrastructure
**Skip Cloud DNS if:**
- Not using GCP (no private zone benefit)
- Need advanced routing policies (Route53 has more)
- Want built-in DDoS protection (Cloudflare better)
- Need health checks with failover (Route53 better)
---
## Azure DNS
### Overview
Azure DNS is Microsoft's cloud DNS service with tight Azure integration and support for both public and private DNS zones.
**Key Strengths:**
- Seamless Azure integration
- Azure Private DNS zones for VNets
- Azure role-based access control (RBAC)
- Traffic Manager integration
- Anycast network (Microsoft's global infrastructure)
**Best For:**
- Azure-native applications
- Integration with Azure Traffic Manager
- Organizations using Microsoft Azure
- Hybrid cloud scenarios
### Features
**Public DNS:**
- Internet-facing DNS zones
- Standard record types
- Anycast DNS for fast resolution
- Azure RBAC for access control
**Private DNS:**
- Private DNS zones for Azure Virtual Networks
- Auto-registration of VM hostnames
- Cross-region resolution
- Hybrid cloud DNS resolution
**Integration:**
- Azure Traffic Manager (global load balancing)
- Azure Front Door (CDN + WAF)
- Azure Private Link
- Azure Active Directory (authentication)
**ALIAS Records:**
- Point zone apex to Azure resources
- Traffic Manager profiles
- Azure CDN endpoints
- Azure Front Door
### Pricing (2025)
**Public Hosted Zones:**
- $0.50/month per zone (first 25 zones)
- $0.10/month per zone (additional zones)
**Private DNS Zones:**
- $0.50/month per private zone
- $0.10/month per VNet link
**Queries:**
- $0.40 per million queries (first 1 billion/month)
### Configuration Examples
**Public DNS Zone:**
```hcl
resource "azurerm_dns_zone" "main" {
name = "example.com"
resource_group_name = azurerm_resource_group.main.name
}
resource "azurerm_dns_a_record" "www" {
name = "www"
zone_name = azurerm_dns_zone.main.name
resource_group_name = azurerm_resource_group.main.name
ttl = 300
records = ["192.0.2.1"]
}
resource "azurerm_dns_cname_record" "blog" {
name = "blog"
zone_name = azurerm_dns_zone.main.name
resource_group_name = azurerm_resource_group.main.name
ttl = 3600
record = "example.com"
}
```
**Private DNS Zone:**
```hcl
resource "azurerm_private_dns_zone" "internal" {
name = "internal.example.com"
resource_group_name = azurerm_resource_group.main.name
}
resource "azurerm_private_dns_zone_virtual_network_link" "main" {
name = "main-vnet-link"
resource_group_name = azurerm_resource_group.main.name
private_dns_zone_name = azurerm_private_dns_zone.internal.name
virtual_network_id = azurerm_virtual_network.main.id
registration_enabled = true # Auto-register VM hostnames
}
resource "azurerm_private_dns_a_record" "db" {
name = "db"
zone_name = azurerm_private_dns_zone.internal.name
resource_group_name = azurerm_resource_group.main.name
ttl = 300
records = ["10.0.1.10"]
}
```
**Traffic Manager Integration:**
```hcl
resource "azurerm_traffic_manager_profile" "main" {
name = "app-traffic-manager"
resource_group_name = azurerm_resource_group.main.name
traffic_routing_method = "Performance" # or "Weighted", "Priority", "Geographic"
dns_config {
relative_name = "app"
ttl = 60
}
monitor_config {
protocol = "HTTPS"
port = 443
path = "/health"
}
}
# DNS record pointing to Traffic Manager
resource "azurerm_dns_cname_record" "app" {
name = "app"
zone_name = azurerm_dns_zone.main.name
resource_group_name = azurerm_resource_group.main.name
ttl = 300
record = azurerm_traffic_manager_profile.main.fqdn
}
```
### When to Choose Azure DNS
**Choose Azure DNS if:**
- Using Microsoft Azure (App Service, VMs, AKS)
- Need Azure Private DNS for VNet resolution
- Want integration with Traffic Manager
- Using Azure Private Link
- Prefer ARM templates or Bicep for infrastructure
- Require Azure RBAC for DNS management
**Skip Azure DNS if:**
- Not using Azure (no private zone benefit)
- Need advanced routing (Route53 has more options)
- Want lowest zone cost (Cloud DNS cheaper)
- Need fastest DNS globally (Cloudflare typically faster)
---
## Cloudflare
### Overview
Cloudflare DNS is one of the world's fastest DNS services with built-in DDoS protection and generous free tier.
**Key Strengths:**
- Fastest DNS query times globally (consistently)
- Built-in DDoS protection (always-on)
- Free tier with unlimited queries
- Tight CDN integration
- Simplest setup and management
- CNAME flattening (works at zone apex)
**Best For:**
- Multi-cloud or cloud-agnostic infrastructure
- Performance-focused applications
- Budget-conscious organizations
- DDoS protection requirements
- Global user base
### Features
**Performance:**
- Global anycast network (200+ data centers)
- Fastest average DNS query time
- Automatic IPv6 support
- Edge-optimized resolution
**Security:**
- Built-in DDoS protection (all plans)
- DNSSEC (one-click enable)
- CAA record support
- Rate limiting and access control
**Load Balancing (Business/Enterprise):**
- Geo-steering
- Health checks (HTTP/HTTPS/TCP)
- Session affinity
- Weighted pools
- Active-active or active-passive
**CDN Integration:**
- Proxied records (orange cloud)
- Automatic SSL/TLS
- WAF (Web Application Firewall)
- Caching and optimization
**CNAME Flattening:**
- CNAME works at zone apex
- Automatically resolved to A record
- Transparent to clients
### Pricing (2025)
**Free Tier:**
- Unlimited DNS queries
- Basic DDoS protection
- DNSSEC
- SSL/TLS
- CDN caching
- IPv6 support
**Pro ($20/month per zone):**
- All free features
- Enhanced DDoS protection
- Page rules
- Image optimization
- Mobile optimization
**Business ($200/month per zone):**
- All Pro features
- Load balancing (geo-steering)
- Advanced DDoS protection
- 100% uptime SLA
- Enhanced support
**Enterprise (Custom pricing):**
- Custom SLA
- Dedicated support
- Advanced security features
- Custom SSL certificates
### Configuration Examples
**Basic Records:**
```hcl
resource "cloudflare_zone" "main" {
account_id = var.cloudflare_account_id
zone = "example.com"
}
resource "cloudflare_record" "apex" {
zone_id = cloudflare_zone.main.id
name = "example.com"
type = "A"
value = "192.0.2.1"
ttl = 300
proxied = true # Route through Cloudflare CDN (orange cloud)
}
resource "cloudflare_record" "www" {
zone_id = cloudflare_zone.main.id
name = "www"
type = "CNAME"
value = "example.com"
ttl = 3600
proxied = true
}
```
**Load Balancer with Geo-Steering:**
```hcl
# US pool
resource "cloudflare_load_balancer_pool" "us" {
account_id = var.cloudflare_account_id
name = "us-pool"
origins {
name = "us-east-1"
address = "192.0.2.1"
enabled = true
}
check_regions = ["WNAM", "ENAM"]
monitor = cloudflare_load_balancer_monitor.https.id
}
# Europe pool
resource "cloudflare_load_balancer_pool" "eu" {
account_id = var.cloudflare_account_id
name = "eu-pool"
origins {
name = "eu-west-1"
address = "192.0.2.10"
enabled = true
}
check_regions = ["WEU", "EEU"]
monitor = cloudflare_load_balancer_monitor.https.id
}
# Health check
resource "cloudflare_load_balancer_monitor" "https" {
account_id = var.cloudflare_account_id
type = "https"
port = 443
path = "/health"
interval = 60
timeout = 5
retries = 2
}
# Load balancer with geo-steering
resource "cloudflare_load_balancer" "app" {
zone_id = cloudflare_zone.main.id
name = "app.example.com"
default_pool_ids = [cloudflare_load_balancer_pool.us.id]
fallback_pool_id = cloudflare_load_balancer_pool.us.id
ttl = 30
proxied = true
region_pools {
region = "WNAM"
pool_ids = [cloudflare_load_balancer_pool.us.id]
}
region_pools {
region = "WEU"
pool_ids = [cloudflare_load_balancer_pool.eu.id]
}
}
```
### When to Choose Cloudflare
**Choose Cloudflare if:**
- Need fastest DNS query times globally
- Want built-in DDoS protection
- Budget-conscious (free tier very generous)
- Multi-cloud or cloud-agnostic
- Need CDN + DNS combo
- Want simplest management interface
- Global user base
**Skip Cloudflare if:**
- Already heavily invested in cloud provider (AWS/GCP/Azure)
- Need advanced routing beyond geo-steering (Route53 better)
- Require integration with cloud-specific features
- Need private DNS zones (cloud providers better)
---
## Provider Comparison Matrix
### Feature Comparison
| Feature | Route53 | Cloud DNS | Azure DNS | Cloudflare |
|---------|---------|-----------|-----------|------------|
| **Routing Policies** | 7 types | 2 types | Via Traffic Mgr | Geo-steering |
| **Health Checks** | ✅ Native | ❌ | ✅ Traffic Mgr | ✅ Business+ |
| **DNSSEC** | ✅ | ✅ Auto rotation | ✅ | ✅ One-click |
| **Private Zones** | ❌ | ✅ VPC | ✅ VNet | ❌ |
| **ALIAS/CNAME Apex** | ✅ ALIAS | ❌ | ✅ | ✅ Flattening |
| **DDoS Protection** | Via Shield | Via Cloud Armor | Via DDoS Prot | ✅ Built-in |
| **CDN Integration** | CloudFront | Cloud CDN | Front Door | ✅ Native |
| **Free Tier** | ❌ | ❌ | ❌ | ✅ Generous |
| **Global Anycast** | ✅ | ✅ | ✅ | ✅ 200+ DCs |
### Pricing Comparison
| Item | Route53 | Cloud DNS | Azure DNS | Cloudflare |
|------|---------|-----------|-----------|------------|
| **Hosted Zone** | $0.50/mo | $0.20/mo | $0.50/mo | Free |
| **Queries (per M)** | $0.40 | $0.40 | $0.40 | Free |
| **Health Checks** | $0.50/mo | N/A | Via TM | $200/mo (Business+) |
| **Free Tier** | ❌ | ❌ | ❌ | ✅ Unlimited queries |
### Performance Comparison (Average Query Time)
Based on independent benchmarks (2025):
1. **Cloudflare**: ~10-15ms (consistently fastest)
2. **Cloud DNS**: ~15-20ms
3. **Route53**: ~15-25ms
4. **Azure DNS**: ~20-30ms
*Note: Actual performance varies by client location and network conditions*
---
## Selection Guide
### Decision Tree
```
Primary Cloud Platform?
├─ AWS → Route53 (ALIAS, tight integration)
├─ GCP → Cloud DNS (private zones, DNSSEC)
├─ Azure → Azure DNS (Traffic Manager, VNet)
└─ Multi-cloud/None → Continue...
Need Advanced Routing?
├─ Yes → Route53 (7 routing policies)
└─ No → Continue...
Need Fastest Global DNS?
├─ Yes → Cloudflare (consistently fastest)
└─ No → Continue...
Need Private/Internal DNS?
├─ Yes → Cloud DNS or Azure DNS (VPC/VNet zones)
└─ No → Continue...
Budget Consideration?
├─ Lowest zone cost → Cloud DNS ($0.20/mo)
├─ Free tier → Cloudflare (unlimited)
└─ Standard → Any (similar query pricing)
Need DDoS Protection?
├─ Yes → Cloudflare (built-in all plans)
└─ No → Any provider
```
### By Use Case
**E-commerce Website (Global):**
- Primary: Cloudflare (speed + DDoS + CDN)
- Alternative: Route53 (if AWS-based)
**Enterprise SaaS (AWS-based):**
- Primary: Route53 (ALIAS, health checks, AWS integration)
- Alternative: Cloudflare + Route53 (multi-provider)
**Startup (Budget-conscious):**
- Primary: Cloudflare (free tier)
- Alternative: Cloud DNS (lowest zone cost)
**Internal Corporate DNS:**
- Primary: Cloud DNS or Azure DNS (private zones)
- Alternative: Route53 + VPN (if AWS)
**API Platform with Failover:**
- Primary: Route53 (health checks, weighted routing)
- Alternative: Cloudflare Business (load balancing)
**Multi-cloud Architecture:**
- Primary: Cloudflare (cloud-agnostic)
- Alternative: OctoDNS or DNSControl (sync multiple providers)
### Multi-Provider Strategies
**Strategy 1: Primary + Secondary DNS**
```
Primary: Cloudflare (fastest, DDoS protection)
Secondary: Route53 (AWS resources, health checks)
Use OctoDNS to sync records between providers
```
**Strategy 2: Regional Split**
```
Global: Cloudflare (CDN + DNS)
AWS: Route53 (internal AWS resources)
GCP: Cloud DNS (GCP-specific services)
Use Terraform modules to manage all providers
```
**Strategy 3: Failover Redundancy**
```
Primary: Route53 (primary authoritative)
Backup: Cloudflare (NS backup)
Sync: OctoDNS (automatic sync)
Configure both as NS records at registrar
```
---
## Summary
### Quick Recommendations
**Choose AWS Route53 for:**
- AWS-heavy infrastructure
- Advanced routing policies
- Health checks and automatic failover
- ALIAS records to AWS resources
**Choose Google Cloud DNS for:**
- GCP-native applications
- Strong DNSSEC requirements
- Private VPC DNS zones
- Lowest hosted zone cost
**Choose Azure DNS for:**
- Azure-native applications
- Traffic Manager integration
- Private VNet DNS zones
- Azure RBAC requirements
**Choose Cloudflare for:**
- Fastest global DNS performance
- Built-in DDoS protection
- Budget-conscious (free tier)
- Multi-cloud or cloud-agnostic
- CDN + DNS combo
### No Wrong Choice
All four providers offer:
- 100% uptime SLAs
- Global anycast networks
- DNSSEC support
- Similar query pricing ($0.40/M)
- Terraform/IaC support
The "best" provider depends on your specific requirements, existing infrastructure, and priorities.
```
### references/troubleshooting.md
```markdown
# DNS Troubleshooting - Detailed Reference
Complete guide to diagnosing and resolving common DNS issues with tools, commands, and solutions.
## Table of Contents
1. [Essential DNS Tools](#essential-dns-tools)
2. [Common Problems and Solutions](#common-problems-and-solutions)
3. [Diagnostic Workflows](#diagnostic-workflows)
4. [Provider-Specific Issues](#provider-specific-issues)
5. [Propagation Checkers](#propagation-checkers)
---
## Essential DNS Tools
### dig (Domain Information Groper)
Primary DNS debugging tool on Unix/Linux/macOS.
**Basic Queries:**
```bash
# Simple query
dig example.com
# Clean output (just the answer)
dig example.com +short
# Specific record type
dig example.com A
dig example.com AAAA
dig example.com MX
dig example.com TXT
dig example.com NS
dig example.com CAA
```
**Query Specific DNS Server:**
```bash
# Google DNS
dig @8.8.8.8 example.com
# Cloudflare DNS
dig @1.1.1.1 example.com
# Quad9
dig @9.9.9.9 example.com
# OpenDNS
dig @208.67.222.222 example.com
# Authoritative nameserver
dig @ns1.example.com example.com
```
**Advanced Queries:**
```bash
# Trace DNS resolution path
dig +trace example.com
# Shows: root servers → TLD servers → authoritative servers → answer
# Check TTL
dig example.com | grep -A1 "ANSWER SECTION"
# Output: example.com. 300 IN A 192.0.2.1
# ^^^ TTL value
# DNSSEC validation
dig example.com +dnssec
# Reverse DNS lookup
dig -x 192.0.2.1
# Only show answer section
dig example.com +noall +answer
# Show query time
dig example.com +stats
```
---
### nslookup
Cross-platform DNS query tool (Windows/Unix/Linux/macOS).
**Basic Usage:**
```bash
# Simple query
nslookup example.com
# Query specific server
nslookup example.com 8.8.8.8
# Interactive mode
nslookup
> server 8.8.8.8
> set type=MX
> example.com
> set type=TXT
> example.com
> exit
```
**Query Types:**
```bash
# MX records
nslookup -type=MX example.com
# NS records
nslookup -type=NS example.com
# TXT records
nslookup -type=TXT example.com
# Any record
nslookup -type=ANY example.com
```
---
### host
Simple DNS lookup utility.
**Usage:**
```bash
# Simple lookup
host example.com
# Verbose output
host -v example.com
# Specific record type
host -t MX example.com
host -t TXT example.com
host -t NS example.com
# Query specific server
host example.com 8.8.8.8
# All records
host -a example.com
```
---
### whois
Domain registration and nameserver information.
**Usage:**
```bash
# Domain registration info
whois example.com
# Specific section
whois example.com | grep -i "name server"
whois example.com | grep -i "registrar"
whois example.com | grep -i "expir"
```
---
### DNS Cache Flushing
**macOS:**
```bash
sudo dscacheutil -flushcache
sudo killall -HUP mDNSResponder
# Verify cache cleared
sudo dscacheutil -cachedump -entries Host
```
**Windows:**
```cmd
ipconfig /flushdns
ipconfig /displaydns # View cache
```
**Linux (systemd-resolved):**
```bash
sudo systemd-resolve --flush-caches
sudo systemd-resolve --statistics
# resolvectl (newer)
sudo resolvectl flush-caches
```
**Linux (nscd):**
```bash
sudo /etc/init.d/nscd restart
# or
sudo systemctl restart nscd
```
**Linux (dnsmasq):**
```bash
sudo systemctl restart dnsmasq
```
---
## Common Problems and Solutions
### Problem 1: DNS Propagation Delays
**Symptoms:**
- Changes not visible after DNS update
- Some locations see new records, others see old
- Client reports "not working" but you see updated records
**Diagnosis:**
```bash
# Check current TTL
dig example.com | grep -A1 "ANSWER SECTION"
# If TTL is high (e.g., 3600s), propagation will take that long
# Check authoritative server directly
dig @$(dig example.com NS +short | head -1) example.com
# This shows what the authoritative server returns
# Check multiple public resolvers
dig @8.8.8.8 example.com +short # Google
dig @1.1.1.1 example.com +short # Cloudflare
dig @208.67.222.222 example.com +short # OpenDNS
# Check from client's resolver
dig example.com
# Shows what the client's resolver is caching
```
**Solutions:**
**Solution 1: Wait for TTL to Expire**
```
Max propagation time = Old TTL + New TTL
Example: Old TTL 3600s means up to 1 hour wait
```
**Solution 2: Pre-Lower TTL (Future Prevention)**
```bash
# 48 hours before change
# Lower TTL to 300s
# Make change
# Propagation now only ~10 minutes
# 24 hours after change
# Raise TTL back to 3600s
```
**Solution 3: Flush Local Cache**
```bash
# Client-side
# macOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Windows
ipconfig /flushdns
# Linux
sudo systemd-resolve --flush-caches
```
**Solution 4: Verify with Propagation Checkers**
- https://www.whatsmydns.net/
- https://dnschecker.org/
- https://dnspropagation.net/
---
### Problem 2: CNAME at Zone Apex Error
**Symptoms:**
- Error: "CNAME conflicts with other records at apex"
- Cannot point example.com (not www) to CDN
- DNS validation fails for CNAME at @
**Diagnosis:**
```bash
# Check for CNAME at zone apex
dig example.com CNAME
# Check for SOA record (conflicts with CNAME)
dig example.com SOA
# Verify nameservers
dig example.com NS
```
**Explanation:**
CNAME records cannot exist at the zone apex because:
1. SOA and NS records must exist at zone apex
2. CNAME cannot coexist with other records
3. RFC 1912 restriction
**Solutions:**
**Solution 1: Use ALIAS Record (Provider-Specific)**
```hcl
# Route53 ALIAS
resource "aws_route53_record" "apex" {
zone_id = aws_route53_zone.main.zone_id
name = "example.com"
type = "A"
alias {
name = "d111111abcdef8.cloudfront.net"
zone_id = "Z2FDTNDATAQYW2"
evaluate_target_health = false
}
}
# Cloudflare (automatic CNAME flattening)
resource "cloudflare_record" "apex" {
zone_id = cloudflare_zone.main.id
name = "example.com"
type = "CNAME"
value = "d111111abcdef8.cloudfront.net"
proxied = true
}
```
**Solution 2: Use A/AAAA Record**
```bash
# If CDN provides static IP
example.com. 300 IN A 192.0.2.1
```
**Solution 3: Use Subdomain**
```bash
# Point www to CDN
www.example.com. 3600 IN CNAME cdn.provider.com.
# Redirect apex to www at application level
example.com. 300 IN A 192.0.2.1 # Your redirect server
```
---
### Problem 3: Missing DNS Records After Migration
**Symptoms:**
- Some services stop working after DNS provider migration
- Email or subdomains not resolving
- Incomplete zone transfer
**Diagnosis:**
```bash
# Export all record types from old provider
for type in A AAAA CNAME MX TXT SRV CAA NS SOA; do
echo "=== $type records ==="
dig @old-ns1.provider.com example.com $type +noall +answer
done
# Compare with new provider
for type in A AAAA CNAME MX TXT SRV CAA NS SOA; do
echo "=== $type records ==="
dig @new-ns1.provider.com example.com $type +noall +answer
done
# Check for zone transfer support (usually disabled)
dig @old-ns1.provider.com example.com AXFR
```
**Solutions:**
**Solution 1: Audit Before Migration**
```bash
# Create complete inventory
# Save to file for comparison
dig @old-ns1.provider.com example.com ANY > old-dns.txt
# After migration, compare
dig @new-ns1.provider.com example.com ANY > new-dns.txt
diff old-dns.txt new-dns.txt
```
**Solution 2: Use DNS Migration Tools**
```bash
# OctoDNS to export existing config
octodns-dump \
--config-file=octodns-config.yaml \
--output-dir=./backup \
example.com
# Import to new provider
octodns-sync \
--config-file=octodns-config.yaml \
--doit
```
**Solution 3: Parallel Operation**
```bash
# Keep old provider active for 48 hours
# Add both old and new NS records temporarily
# Verify all records migrated before full cutover
```
**Solution 4: Common Missed Records**
```bash
# Often forgotten:
# - TXT records (SPF, DKIM, DMARC, verification)
# - SRV records (VoIP, XMPP)
# - CAA records
# - Wildcard records (*.example.com)
# - Subdomain delegations (NS records)
# Verify each:
dig example.com TXT +short
dig example.com CAA +short
dig _dmarc.example.com TXT +short
dig *.example.com A +short
```
---
### Problem 4: DNS Loops or CNAME Chains
**Symptoms:**
- DNS resolution fails
- Error: "CNAME loop detected"
- Timeout or SERVFAIL response
**Diagnosis:**
```bash
# Trace CNAME chain
dig +trace www.example.com
# Check for circular references
dig www.example.com CNAME +short
# If returns www.example.com → loop!
# Visualize chain
dig www.example.com +noall +answer
```
**Examples of Loops:**
```bash
# ❌ Direct loop
www.example.com. IN CNAME www.example.com.
# ❌ Indirect loop
www.example.com. IN CNAME web.example.com.
web.example.com. IN CNAME www.example.com.
# ❌ Excessive chain (>5 hops)
www → cdn → lb → app → backend → server → IP
```
**Solutions:**
**Solution 1: Break the Loop**
```bash
# Identify loop point
dig +trace www.example.com
# Fix: Point to A record instead
www.example.com. IN A 192.0.2.1
```
**Solution 2: Limit CNAME Depth**
```bash
# ✅ Good: 1-2 hops
www.example.com. IN CNAME cdn.example.com.
cdn.example.com. IN A 192.0.2.1
# ❌ Bad: 5+ hops
# Causes slow resolution, potential timeouts
```
**Solution 3: Use A Records for Final Target**
```bash
# Always terminate CNAME chain with A/AAAA
app.example.com. IN CNAME lb.example.com.
lb.example.com. IN A 192.0.2.1 # Terminal record
```
---
### Problem 5: external-dns Not Creating Records
**Symptoms:**
- Kubernetes Service/Ingress has annotation
- No DNS record created
- external-dns logs show errors
**Diagnosis:**
```bash
# Check external-dns logs
kubectl logs -n external-dns deployment/external-dns -f
# Check Service annotations
kubectl get service nginx -o yaml | grep -A5 annotations
# Verify domain filter
kubectl logs -n external-dns deployment/external-dns | grep "domain-filter"
# Check provider credentials
kubectl get secret -n external-dns
# Test DNS provider access
kubectl logs -n external-dns deployment/external-dns | grep -i "error\|fail"
```
**Common Issues and Solutions:**
**Issue 1: Domain Not in Filter**
```bash
# Check current filter
kubectl describe deployment -n external-dns external-dns | grep domain-filter
# Fix: Add domain to filter
helm upgrade external-dns external-dns/external-dns \
--namespace external-dns \
--set domainFilters[0]=example.com \
--reuse-values
```
**Issue 2: Missing Provider Credentials**
```bash
# Check secret exists
kubectl get secret -n external-dns external-dns
# Create secret (AWS example)
kubectl create secret generic external-dns \
--namespace external-dns \
--from-literal=aws_access_key_id=$AWS_ACCESS_KEY_ID \
--from-literal=aws_secret_access_key=$AWS_SECRET_ACCESS_KEY
```
**Issue 3: Wrong Policy**
```bash
# Check current policy
kubectl describe deployment -n external-dns external-dns | grep policy
# upsert-only: Only creates, doesn't delete
# sync: Creates and deletes (recommended)
# Fix: Change to sync
helm upgrade external-dns external-dns/external-dns \
--namespace external-dns \
--set policy=sync \
--reuse-values
```
**Issue 4: Annotation Typo**
```yaml
# ❌ WRONG - typo in annotation name
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostnam: example.com # Missing 'e'
# ✅ CORRECT
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: example.com
external-dns.alpha.kubernetes.io/ttl: "300"
```
**Issue 5: LoadBalancer Not Ready**
```bash
# Check Service status
kubectl get service nginx
# external-dns waits for LoadBalancer IP/hostname
# If EXTERNAL-IP is <pending>, external-dns cannot create record
# Solution: Wait for LoadBalancer to provision
# Or use type: NodePort with externalIPs
```
---
### Problem 6: DNSSEC Validation Failures
**Symptoms:**
- SERVFAIL responses
- Works with some resolvers, not others
- dig +dnssec shows bogus status
**Diagnosis:**
```bash
# Check DNSSEC status
dig example.com +dnssec
# Validate DNSSEC chain
dig example.com +dnssec +multiline
# Check with validating resolver
dig @1.1.1.1 example.com +dnssec
dig @8.8.8.8 example.com +dnssec
# Verify DS record at parent
dig example.com DS +trace
```
**Common DNSSEC Issues:**
**Issue 1: Missing DS Record at Registrar**
```bash
# Check parent zone for DS record
dig example.com DS
# If missing, DNSSEC chain is broken
# Solution: Add DS record at domain registrar
```
**Issue 2: Key Rotation Problems**
```bash
# DNSSEC keys expire
# Provider should auto-rotate
# Check key expiration
dig example.com DNSKEY +multiline
# Solution: Use provider-managed DNSSEC
# AWS Route53, Google Cloud DNS, Cloudflare all auto-rotate
```
**Issue 3: Clock Skew**
```bash
# DNSSEC signatures are time-sensitive
# Check server time
date
# Ensure NTP is running
timedatectl status
```
---
## Diagnostic Workflows
### Workflow 1: Website Not Resolving
```bash
# Step 1: Verify DNS record exists
dig example.com +short
# Expected: IP address
# If empty: Record doesn't exist or TTL expired
# Step 2: Check authoritative nameservers
dig example.com NS +short
# Expected: List of nameservers
# Step 3: Query authoritative directly
dig @$(dig example.com NS +short | head -1) example.com
# Compare with Step 1
# Step 4: Check propagation
dig @8.8.8.8 example.com +short
dig @1.1.1.1 example.com +short
# Should match authoritative answer
# Step 5: Trace full resolution path
dig +trace example.com
# Shows root → TLD → authoritative → answer
# Step 6: Check client cache
dig example.com
# If different from Step 4, flush client cache
# Step 7: Verify TTL
dig example.com | grep -A1 "ANSWER"
# If high TTL, wait for expiration
```
---
### Workflow 2: Email Delivery Issues
```bash
# Step 1: Check MX records
dig example.com MX +short
# Expected: Priority and mail server
# Step 2: Verify mail server A records
dig mail.example.com A +short
# Mail server must resolve
# Step 3: Check SPF record
dig example.com TXT +short | grep "v=spf1"
# Should include mail servers
# Step 4: Check DMARC
dig _dmarc.example.com TXT +short
# Should have DMARC policy
# Step 5: Check DKIM
dig default._domainkey.example.com TXT +short
# Should have public key
# Step 6: Test from multiple resolvers
dig @8.8.8.8 example.com MX +short
dig @1.1.1.1 example.com MX +short
# Step 7: Verify reverse DNS (PTR)
dig -x <mail-server-ip>
# Should match forward lookup
```
---
### Workflow 3: SSL Certificate Not Working
```bash
# Step 1: Check A record
dig example.com +short
# Must resolve to server IP
# Step 2: Check CAA records
dig example.com CAA +short
# If exists, must allow certificate authority
# Step 3: Check DNS propagation
# Certificate authorities check DNS globally
dig @8.8.8.8 example.com +short
dig @1.1.1.1 example.com +short
# Must be consistent
# Step 4: Verify TXT record (if using DNS challenge)
dig _acme-challenge.example.com TXT +short
# Must match CA's challenge value
# Step 5: Check TTL
dig _acme-challenge.example.com TXT | grep TTL
# If high, CA may cache old value
# Step 6: Test CAA compatibility
dig example.com CAA +short
# Example: 0 issue "letsencrypt.org"
```
---
## Provider-Specific Issues
### AWS Route53 Issues
**Issue: ALIAS Record Not Resolving**
```bash
# Check if target resource exists
dig d111111abcdef8.cloudfront.net +short
# Verify ALIAS configuration
aws route53 list-resource-record-sets \
--hosted-zone-id Z1234567890ABC \
--query "ResourceRecordSets[?Name=='example.com.']"
# Common mistake: Wrong zone ID for target
# Each AWS service has specific zone IDs
```
**Issue: Health Check Failing**
```bash
# View health check status
aws route53 get-health-check-status \
--health-check-id abc123
# Test endpoint manually
curl -I https://example.com/health
# Check health check configuration
aws route53 get-health-check \
--health-check-id abc123
```
---
### Google Cloud DNS Issues
**Issue: Private Zone Not Resolving**
```bash
# Verify VPC link
gcloud dns managed-zones describe internal-zone
# Check VM is in linked VPC
gcloud compute instances list --filter="name=my-vm"
# Test from VM in VPC
gcloud compute ssh my-vm --command "dig db.internal.example.com"
# If fails, check VPC link
gcloud dns managed-zones describe internal-zone \
--format="json" | jq '.privateVisibilityConfig'
```
---
### Cloudflare Issues
**Issue: Orange Cloud (Proxied) Showing Cloudflare IP**
```bash
# Check if record is proxied
dig example.com +short
# If shows Cloudflare IP (104.x.x.x), record is proxied
# To see origin IP
# Option 1: Use DNS query to origin
dig @origin-nameserver.cloudflare.com example.com
# Option 2: Temporarily disable proxy
# (Cloudflare dashboard or API)
# Option 3: Query specific Cloudflare DNS
dig example.com A +short @1.1.1.1
```
---
## Propagation Checkers
### Online Tools
**Recommended:**
- **https://www.whatsmydns.net/** - Best visual interface
- **https://dnschecker.org/** - Comprehensive global check
- **https://dnspropagation.net/** - Clean interface
- **https://www.digwebinterface.com/** - Advanced options
**How to Use:**
1. Enter domain name
2. Select record type (A, AAAA, MX, etc.)
3. View results from 20+ global locations
4. Green checkmarks = propagated
5. Wait and recheck if not propagated
### Command-Line Propagation Check
**Check Multiple Resolvers:**
```bash
#!/bin/bash
# check-propagation.sh
DOMAIN=$1
RECORD_TYPE=${2:-A}
resolvers=(
"8.8.8.8 (Google)"
"8.8.4.4 (Google Secondary)"
"1.1.1.1 (Cloudflare)"
"1.0.0.1 (Cloudflare Secondary)"
"208.67.222.222 (OpenDNS)"
"208.67.220.220 (OpenDNS Secondary)"
"9.9.9.9 (Quad9)"
"64.6.64.6 (Verisign)"
)
echo "Checking DNS propagation for $DOMAIN ($RECORD_TYPE)"
echo "================================================"
for resolver in "${resolvers[@]}"; do
ip=$(echo $resolver | awk '{print $1}')
name=$(echo $resolver | cut -d'(' -f2 | cut -d')' -f1)
result=$(dig @$ip $DOMAIN $RECORD_TYPE +short | head -1)
echo "$name: $result"
done
```
**Usage:**
```bash
chmod +x check-propagation.sh
./check-propagation.sh example.com A
./check-propagation.sh example.com MX
```
---
## Quick Reference
### Common Error Messages
| Error | Meaning | Solution |
|-------|---------|----------|
| NXDOMAIN | Domain doesn't exist | Check spelling, verify NS records |
| SERVFAIL | Server failure | Check authoritative NS, DNSSEC issues |
| REFUSED | Query refused | Nameserver doesn't serve this zone |
| TIMEOUT | No response | Network issue, firewall, wrong NS |
| NOERROR (no answer) | Zone exists but record doesn't | Add missing record |
### Quick Diagnostic Commands
```bash
# Essential checks
dig example.com +short # Basic resolution
dig example.com NS +short # Nameservers
dig example.com +trace # Full resolution path
dig example.com | grep TTL # Current TTL
# Propagation check
dig @8.8.8.8 example.com +short # Google
dig @1.1.1.1 example.com +short # Cloudflare
# Cache flush
# macOS: sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Windows: ipconfig /flushdns
# Linux: sudo systemd-resolve --flush-caches
```
---
## When to Contact Support
Contact DNS provider support when:
- Authoritative nameservers not responding (TIMEOUT)
- DNSSEC validation consistently fails
- Zone transfers not working
- Provider dashboard shows errors
- Records not updating after 24+ hours
- Health checks failing for working endpoints
- API/automation not working with valid credentials
Before contacting support, gather:
- Domain name
- Record type and value
- Screenshots of configuration
- Output of `dig +trace example.com`
- External propagation checker results
- Timeline of changes made
```
### scripts/check-dns-propagation.sh
```bash
#!/bin/bash
# DNS Propagation Checker
# Checks DNS resolution across multiple public resolvers
# Usage: ./check-dns-propagation.sh example.com [A|AAAA|MX|TXT|NS]
set -euo pipefail
DOMAIN="${1:-}"
RECORD_TYPE="${2:-A}"
if [ -z "$DOMAIN" ]; then
echo "Usage: $0 <domain> [record-type]"
echo "Example: $0 example.com A"
exit 1
fi
# List of public DNS resolvers
declare -a RESOLVERS=(
"8.8.8.8:Google DNS Primary"
"8.8.4.4:Google DNS Secondary"
"1.1.1.1:Cloudflare DNS Primary"
"1.0.0.1:Cloudflare DNS Secondary"
"208.67.222.222:OpenDNS Primary"
"208.67.220.220:OpenDNS Secondary"
"9.9.9.9:Quad9 Primary"
"149.112.112.112:Quad9 Secondary"
"64.6.64.6:Verisign Primary"
"64.6.65.6:Verisign Secondary"
)
echo "================================================================"
echo "DNS Propagation Check: $DOMAIN ($RECORD_TYPE)"
echo "================================================================"
echo ""
for resolver_info in "${RESOLVERS[@]}"; do
IFS=':' read -r ip name <<< "$resolver_info"
# Query DNS
result=$(dig @"$ip" "$DOMAIN" "$RECORD_TYPE" +short 2>/dev/null | head -1)
if [ -z "$result" ]; then
result="(no record)"
fi
printf "%-30s %s\n" "$name:" "$result"
done
echo ""
echo "================================================================"
echo "Check complete. If results differ, wait for TTL to expire."
echo "================================================================"
```
### scripts/calculate-ttl-propagation.py
```python
#!/usr/bin/env python3
"""
DNS TTL Propagation Time Calculator
Calculates maximum propagation time based on TTL values
Usage:
python3 calculate-ttl-propagation.py <old_ttl> <new_ttl>
python3 calculate-ttl-propagation.py 3600 300
"""
import sys
from datetime import timedelta
def format_time(seconds):
"""Format seconds into human-readable time"""
td = timedelta(seconds=seconds)
days = td.days
hours = td.seconds // 3600
minutes = (td.seconds % 3600) // 60
secs = td.seconds % 60
parts = []
if days > 0:
parts.append(f"{days} day{'s' if days != 1 else ''}")
if hours > 0:
parts.append(f"{hours} hour{'s' if hours != 1 else ''}")
if minutes > 0:
parts.append(f"{minutes} minute{'s' if minutes != 1 else ''}")
if secs > 0 or not parts:
parts.append(f"{secs} second{'s' if secs != 1 else ''}")
return ", ".join(parts)
def calculate_propagation(old_ttl, new_ttl):
"""Calculate maximum DNS propagation time"""
# Add buffer for DNS query time
query_buffer = 5
max_time = old_ttl + new_ttl + query_buffer
print("=" * 60)
print("DNS TTL Propagation Time Calculator")
print("=" * 60)
print()
print(f"Old TTL: {old_ttl}s ({format_time(old_ttl)})")
print(f"New TTL: {new_ttl}s ({format_time(new_ttl)})")
print(f"Query Buffer: {query_buffer}s")
print()
print("-" * 60)
print(f"Maximum Propagation Time: {max_time}s ({format_time(max_time)})")
print("-" * 60)
print()
print("Timeline:")
print(f" T+0: Change made to DNS")
print(f" T+{old_ttl}: Old TTL expires (worst case)")
print(f" T+{max_time}: All resolvers have new record")
print()
print("=" * 60)
# Recommendations
print()
print("Recommendations:")
if old_ttl >= 3600:
print(f" ⚠️ High TTL ({format_time(old_ttl)}) - consider lowering 48h before changes")
print(f" 💡 Lower to 300s (5min) for faster propagation (~10 minutes)")
elif old_ttl <= 300:
print(f" ✅ Low TTL ({format_time(old_ttl)}) - fast propagation")
else:
print(f" ℹ️ Moderate TTL ({format_time(old_ttl)}) - acceptable propagation time")
print()
def main():
if len(sys.argv) < 3:
print("Usage: python3 calculate-ttl-propagation.py <old_ttl> <new_ttl>")
print("Example: python3 calculate-ttl-propagation.py 3600 300")
sys.exit(1)
try:
old_ttl = int(sys.argv[1])
new_ttl = int(sys.argv[2])
if old_ttl < 0 or new_ttl < 0:
raise ValueError("TTL values must be positive")
calculate_propagation(old_ttl, new_ttl)
except ValueError as e:
print(f"Error: Invalid TTL value - {e}")
sys.exit(1)
if __name__ == "__main__":
main()
```