Back to skills
SkillHub ClubShip Full StackFull StackBackend

sanitize

Detect and redact PII from text files. Supports 15 categories including credit cards, SSNs, emails, API keys, addresses, and more — with zero dependencies.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,028
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
C62.8

Install command

npx @skill-hub/cli install openclaw-skills-sanitize

Repository

openclaw/skills

Skill path: skills/agentward-ai/sanitize

Detect and redact PII from text files. Supports 15 categories including credit cards, SSNs, emails, API keys, addresses, and more — with zero dependencies.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack, Backend.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install sanitize into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding sanitize to shared team environments
  • Use sanitize for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: sanitize
description: Detect and redact PII from text files. Supports 15 categories including credit cards, SSNs, emails, API keys, addresses, and more — with zero dependencies.
version: "1.0.0"
metadata:
  openclaw:
    requires:
      bins:
        - python3
    emoji: "\U0001F6E1"
    homepage: https://github.com/agentward-ai/agentward
  files:
    - scripts/sanitize.py
---

# AgentWard Sanitize

Detect and redact personally identifiable information (PII) from text files.

## IMPORTANT — PII Safety Rules
- Do NOT read the input file directly. It may contain sensitive PII.
- ALWAYS use `--output FILE` to write sanitized output to a file.
- Only read the OUTPUT file, never the raw input.
- Only show the user the redacted output, never the raw input.
- `--json` and `--preview` are safe — they do NOT print raw PII values to stdout.
- The entity map (raw PII → placeholder mapping) is written to a separate sidecar file (`*.entity-map.json`) only when `--output` is used. Do NOT read the entity map file.

## What it does

Scans files for PII — credit cards, SSNs, emails, phone numbers, API keys, IP addresses, mailing addresses, dates of birth, passport numbers, driver's license numbers, bank routing numbers, medical license numbers, and insurance member IDs — and replaces each instance with a numbered placeholder like `[CREDIT_CARD_1]`.

## Usage

### Sanitize a file (RECOMMENDED — always use --output)
```bash
python scripts/sanitize.py patient-notes.txt --output clean.txt
```

### Preview mode (detect PII categories/offsets without showing raw values)
```bash
python scripts/sanitize.py notes.md --preview
```

### JSON output (safe — no raw PII in stdout)
```bash
python scripts/sanitize.py report.txt --json --output clean.txt
```

### Filter to specific categories
```bash
python scripts/sanitize.py log.txt --categories ssn,credit_card,email --output clean.txt
```

## Supported PII categories

See `references/SUPPORTED_PII.md` for the full list with detection methods and false positive mitigation.

| Category | Pattern type | Example |
|---|---|---|
| `credit_card` | Luhn-validated 13-19 digits | 4111 1111 1111 1111 |
| `ssn` | 3-2-4 digit groups | 123-45-6789 |
| `cvv` | Keyword-anchored 3-4 digits | CVV: 123 |
| `expiry_date` | Keyword-anchored MM/YY | expiry 01/30 |
| `api_key` | Provider prefix patterns | sk-abc..., ghp_..., AKIA... |
| `email` | Standard email format | [email protected] |
| `phone` | US/intl phone numbers | +1 (555) 123-4567 |
| `ip_address` | IPv4 addresses | 192.168.1.100 |
| `date_of_birth` | Keyword-anchored dates | DOB: 03/15/1985 |
| `passport` | Keyword-anchored alphanumeric | Passport: AB1234567 |
| `drivers_license` | Keyword-anchored alphanumeric | DL: D12345678 |
| `bank_routing` | Keyword-anchored 9 digits | routing: 021000021 |
| `address` | Street + city/state/zip | 742 Evergreen Terrace Dr, Springfield, IL 62704 |
| `medical_license` | Keyword-anchored license ID | License: CA-MD-8827341 |
| `insurance_id` | Keyword-anchored member/policy ID | Member ID: BCB-2847193 |

## Security and Privacy

- **All processing is local.** The script makes zero network calls. No data leaves your machine.
- **Zero dependencies.** Uses only Python standard library — no third-party packages to audit.
- **PII never reaches stdout.** The `--json` and `--preview` modes strip raw PII values from output. The entity map (containing raw PII to placeholder mappings) is only written to a sidecar file on disk when `--output` is used.
- **Designed for agent safety.** The skill instructions above tell the agent to never read the raw input file or the entity map file — only the sanitized output.

## Requirements

- Python 3.11+
- No external dependencies (stdlib only)

## About

Built by [AgentWard](https://agentward.ai) — the open-source permission control plane for AI agents.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/sanitize.py

```python
#!/usr/bin/env python3
"""AgentWard Sanitize — standalone PII detection and redaction.

Self-contained script with zero external dependencies.  Can be used as a
Claude/OpenClaw skill or run directly from the command line.

Usage:
    python sanitize.py <file>                      # sanitize to stdout
    python sanitize.py <file> --output clean.txt   # write to file
    python sanitize.py <file> --json               # JSON with entity map
    python sanitize.py <file> --preview            # detect-only, no redaction
    python sanitize.py <file> --categories ssn,email
"""

from __future__ import annotations

import argparse
import json
import re
import sys
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
from typing import Callable


# =====================================================================
# Models
# =====================================================================


class PIICategory(str, Enum):
    CREDIT_CARD = "credit_card"
    CVV = "cvv"
    EXPIRY_DATE = "expiry_date"
    BANK_ROUTING = "bank_routing"
    SSN = "ssn"
    PASSPORT = "passport"
    DRIVERS_LICENSE = "drivers_license"
    API_KEY = "api_key"
    MEDICAL_LICENSE = "medical_license"
    INSURANCE_ID = "insurance_id"
    EMAIL = "email"
    PHONE = "phone"
    IP_ADDRESS = "ip_address"
    DATE_OF_BIRTH = "date_of_birth"
    ADDRESS = "address"


@dataclass(frozen=True)
class DetectedEntity:
    category: PIICategory
    text: str
    start: int
    end: int
    confidence: float = 1.0


@dataclass
class SanitizeResult:
    original_text: str
    sanitized_text: str
    entities: list[DetectedEntity] = field(default_factory=list)
    entity_map: dict[str, str] = field(default_factory=dict)

    @property
    def has_pii(self) -> bool:
        return len(self.entities) > 0

    @property
    def summary(self) -> dict[str, int]:
        counts: dict[str, int] = {}
        for e in self.entities:
            counts[e.category.value] = counts.get(e.category.value, 0) + 1
        return counts


# =====================================================================
# Luhn algorithm (for credit card validation)
# =====================================================================


def _luhn_check(digits: str) -> bool:
    if not digits or not digits.isdigit():
        return False
    total = 0
    for i, d in enumerate(reversed(digits)):
        n = int(d)
        if i % 2 == 1:
            n *= 2
            if n > 9:
                n -= 9
        total += n
    return total % 10 == 0


# =====================================================================
# Regex patterns
# =====================================================================

_CREDIT_CARD_RE = re.compile(r"\b(?:\d[\ \-]*?){13,19}\b")
_SSN_RE = re.compile(r"\b(\d{3})[\ \-]?(\d{2})[\ \-]?(\d{4})\b")
_CVV_RE = re.compile(r"\b(?:cvv|cvc|cvv2|security\s+code)\s*[:\s]\s*(\d{3,4})\b", re.I)
_EXPIRY_RE = re.compile(r"\b(?:exp(?:iry)?(?:\s*date)?)\s*[:\s]\s*(\d{1,2}\s*[/-]\s*\d{2,4})\b", re.I)
_API_KEY_RE = re.compile(
    r"\b(sk-[a-zA-Z0-9\-_]{20,}|ghp_[a-zA-Z0-9]{36}"
    r"|xoxb-[a-zA-Z0-9\-]{20,}|xoxp-[a-zA-Z0-9\-]{20,}"
    r"|AKIA[A-Z0-9]{16})\b"
)
_EMAIL_RE = re.compile(r"\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b")
_PHONE_RE = re.compile(
    r"(?<!\d)(?:\+\d{1,3}[\s.-]?)?(?:\(?\d{2,4}\)?[\s.-])?\d{3,4}[\s.-]\d{3,4}(?!\d)"
)
_IPV4_RE = re.compile(
    r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b"
)
_DOB_RE = re.compile(
    r"\b(?:d\.?o\.?b\.?|date\s+of\s+birth|birth\s*(?:date|day)|born)\s*[:\s]\s*"
    r"(\d{1,2}[/-]\d{1,2}[/-]\d{2,4}|\d{4}[/-]\d{1,2}[/-]\d{1,2})",
    re.I,
)
_PASSPORT_RE = re.compile(
    r"\b(?:passport)\s*(?:#|no\.?|number)?\s*[:\s]\s*([A-Z][A-Z0-9]{5,8})\b"
    r"|\b(?:passport)\b.{0,50}?:\s*([A-Z][A-Z0-9]{5,8})\b",
    re.I,
)
_DL_RE = re.compile(
    r"\b(?:driver'?s?\s*(?:license|licence)|DL)\s*(?:#|no\.?|number)?\s*[:\s]\s*"
    r"([A-Z0-9]{4,13})\b",
    re.I,
)
_ROUTING_RE = re.compile(
    r"\b(?:routing)\s*(?:#|no\.?|number)?\s*[:\s]\s*(\d{9})\b"
    r"|\b(?:routing)\b.{0,40}?:\s*(\d{9})\b",
    re.I,
)
_ADDRESS_RE = re.compile(
    r"\b\d{1,6}\s+(?:[NSEW]\s+)?[A-Za-z][a-zA-Z]+(?:\s+[A-Za-z][a-zA-Z]+){0,3}\s+"
    r"(?:St|Street|Ave|Avenue|Blvd|Boulevard|Dr|Drive|Ln|Lane|Rd|Road|Ct|Court"
    r"|Way|Pl|Place|Cir|Circle)\b\.?"
    r"(?:,?\s+[A-Z][a-zA-Z]+(?:\s+[A-Z][a-zA-Z]+)?"
    r",?\s+[A-Z]{2}"
    r"(?:\s+\d{5}(?:-\d{4})?)?)?",
    re.I,
)
_MEDICAL_LICENSE_RE = re.compile(
    r"\b(?:(?:medical\s+)?license|lic(?:ense)?)\s*(?:#|no\.?|number)?\s*[:\s]\s*"
    r"([A-Z]{1,3}[-]?[A-Z]{0,3}[-]?\d{4,10})\b",
    re.I,
)
_INSURANCE_ID_RE = re.compile(
    r"\b(?:member\s*(?:id|#)|insurance\s*(?:id|#)|subscriber\s*(?:id|#)"
    r"|policy\s*(?:id|#|number))\s*(?:#|no\.?|number)?\s*[:\s]\s*"
    r"([A-Z0-9][-A-Z0-9]{3,20})\b",
    re.I,
)

_INVALID_SSN_AREAS: frozenset[str] = frozenset({"000", "666"})


# =====================================================================
# Detector functions
# =====================================================================


def _detect_credit_cards(text: str) -> list[DetectedEntity]:
    entities: list[DetectedEntity] = []
    for m in _CREDIT_CARD_RE.finditer(text):
        raw = m.group(0)
        digits = re.sub(r"[^0-9]", "", raw)
        if len(digits) < 13 or len(digits) > 19:
            continue
        if _luhn_check(digits):
            entities.append(DetectedEntity(PIICategory.CREDIT_CARD, raw, m.start(), m.end()))
    return entities


def _detect_ssns(text: str) -> list[DetectedEntity]:
    entities: list[DetectedEntity] = []
    for m in _SSN_RE.finditer(text):
        area, group, serial = m.group(1), m.group(2), m.group(3)
        if area in _INVALID_SSN_AREAS or 900 <= int(area) <= 999:
            continue
        if group == "00" or serial == "0000":
            continue
        entities.append(DetectedEntity(PIICategory.SSN, m.group(0), m.start(), m.end()))
    return entities


def _detect_cvvs(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.CVV, m.group(0), m.start(), m.end())
        for m in _CVV_RE.finditer(text)
    ]


def _detect_expiry_dates(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.EXPIRY_DATE, m.group(0), m.start(), m.end())
        for m in _EXPIRY_RE.finditer(text)
    ]


def _detect_api_keys(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.API_KEY, m.group(0), m.start(), m.end())
        for m in _API_KEY_RE.finditer(text)
    ]


def _detect_emails(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.EMAIL, m.group(0), m.start(), m.end())
        for m in _EMAIL_RE.finditer(text)
    ]


def _detect_phones(text: str) -> list[DetectedEntity]:
    entities: list[DetectedEntity] = []
    for m in _PHONE_RE.finditer(text):
        raw = m.group(0)
        if sum(1 for c in raw if c.isdigit()) < 7:
            continue
        entities.append(DetectedEntity(PIICategory.PHONE, raw, m.start(), m.end()))
    return entities


def _detect_ip_addresses(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.IP_ADDRESS, m.group(0), m.start(), m.end())
        for m in _IPV4_RE.finditer(text)
    ]


def _detect_dob(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.DATE_OF_BIRTH, m.group(0), m.start(), m.end())
        for m in _DOB_RE.finditer(text)
    ]


def _detect_passports(text: str) -> list[DetectedEntity]:
    entities: list[DetectedEntity] = []
    for m in _PASSPORT_RE.finditer(text):
        val = m.group(1) or m.group(2)
        grp = 1 if m.group(1) else 2
        entities.append(DetectedEntity(PIICategory.PASSPORT, val, m.start(grp), m.end(grp)))
    return entities


def _detect_drivers_licenses(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.DRIVERS_LICENSE, m.group(0), m.start(), m.end())
        for m in _DL_RE.finditer(text)
    ]


def _detect_routing_numbers(text: str) -> list[DetectedEntity]:
    entities: list[DetectedEntity] = []
    for m in _ROUTING_RE.finditer(text):
        val = m.group(1) or m.group(2)
        grp = 1 if m.group(1) else 2
        entities.append(DetectedEntity(PIICategory.BANK_ROUTING, val, m.start(grp), m.end(grp)))
    return entities


def _detect_addresses(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.ADDRESS, m.group(0), m.start(), m.end())
        for m in _ADDRESS_RE.finditer(text)
    ]


def _detect_medical_licenses(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.MEDICAL_LICENSE, m.group(0), m.start(), m.end())
        for m in _MEDICAL_LICENSE_RE.finditer(text)
    ]


def _detect_insurance_ids(text: str) -> list[DetectedEntity]:
    return [
        DetectedEntity(PIICategory.INSURANCE_ID, m.group(0), m.start(), m.end())
        for m in _INSURANCE_ID_RE.finditer(text)
    ]


_DETECTORS: dict[PIICategory, Callable[[str], list[DetectedEntity]]] = {
    PIICategory.CREDIT_CARD: _detect_credit_cards,
    PIICategory.SSN: _detect_ssns,
    PIICategory.CVV: _detect_cvvs,
    PIICategory.EXPIRY_DATE: _detect_expiry_dates,
    PIICategory.API_KEY: _detect_api_keys,
    PIICategory.EMAIL: _detect_emails,
    PIICategory.PHONE: _detect_phones,
    PIICategory.IP_ADDRESS: _detect_ip_addresses,
    PIICategory.DATE_OF_BIRTH: _detect_dob,
    PIICategory.PASSPORT: _detect_passports,
    PIICategory.DRIVERS_LICENSE: _detect_drivers_licenses,
    PIICategory.BANK_ROUTING: _detect_routing_numbers,
    PIICategory.ADDRESS: _detect_addresses,
    PIICategory.MEDICAL_LICENSE: _detect_medical_licenses,
    PIICategory.INSURANCE_ID: _detect_insurance_ids,
}


# =====================================================================
# Engine
# =====================================================================


def _deduplicate_overlaps(entities: list[DetectedEntity]) -> list[DetectedEntity]:
    """Remove overlapping entities, keeping the longer span."""
    if len(entities) <= 1:
        return entities
    result: list[DetectedEntity] = []
    for ent in entities:
        if result and ent.start < result[-1].end:
            prev = result[-1]
            if (ent.end - ent.start) > (prev.end - prev.start):
                result[-1] = ent
        else:
            result.append(ent)
    return result


def detect_all(
    text: str,
    categories: set[PIICategory] | None = None,
) -> list[DetectedEntity]:
    active = categories if categories is not None else set(_DETECTORS.keys())
    entities: list[DetectedEntity] = []
    for cat, fn in _DETECTORS.items():
        if cat in active:
            entities.extend(fn(text))
    entities.sort(key=lambda e: e.start)
    return _deduplicate_overlaps(entities)


def redact_text(
    text: str,
    entities: list[DetectedEntity],
) -> tuple[str, dict[str, str]]:
    if not entities:
        return text, {}

    seen: dict[tuple[PIICategory, str], int] = {}
    category_counters: dict[PIICategory, int] = {}
    entity_placeholders: dict[int, str] = {}

    for idx, ent in enumerate(entities):
        key = (ent.category, ent.text)
        if key in seen:
            suffix = seen[key]
        else:
            cat_count = category_counters.get(ent.category, 0) + 1
            category_counters[ent.category] = cat_count
            suffix = cat_count
            seen[key] = suffix
        entity_placeholders[idx] = f"[{ent.category.value.upper()}_{suffix}]"

    entity_map: dict[str, str] = {}
    for idx, ent in enumerate(entities):
        ph = entity_placeholders[idx]
        if ph not in entity_map:
            entity_map[ph] = ent.text

    chunks: list[str] = []
    prev_end = 0
    for idx, ent in enumerate(entities):
        chunks.append(text[prev_end:ent.start])
        chunks.append(entity_placeholders[idx])
        prev_end = ent.end
    chunks.append(text[prev_end:])

    return "".join(chunks), entity_map


def sanitize_text(
    text: str,
    categories: set[PIICategory] | None = None,
) -> SanitizeResult:
    entities = detect_all(text, categories=categories)
    sanitized, entity_map = redact_text(text, entities)
    return SanitizeResult(
        original_text=text,
        sanitized_text=sanitized,
        entities=entities,
        entity_map=entity_map,
    )


def sanitize_file(
    path: Path,
    categories: set[PIICategory] | None = None,
) -> SanitizeResult:
    if not path.exists():
        raise FileNotFoundError(f"File not found: {path}")
    text = path.read_text(encoding="utf-8", errors="replace")
    return sanitize_text(text, categories=categories)


# =====================================================================
# CLI
# =====================================================================


def main() -> None:
    parser = argparse.ArgumentParser(
        description="AgentWard Sanitize — detect and redact PII from files.",
    )
    parser.add_argument("file", type=Path, help="File to sanitize")
    parser.add_argument("--output", "-o", type=Path, help="Write sanitized output to file")
    parser.add_argument("--json", dest="json_output", action="store_true", help="JSON output")
    parser.add_argument("--preview", action="store_true", help="Detect-only, no redaction")
    parser.add_argument(
        "--categories", "-c", type=str, default=None,
        help="Comma-separated PII categories to detect",
    )

    args = parser.parse_args()

    # Parse categories.
    cat_set: set[PIICategory] | None = None
    if args.categories:
        cat_set = set()
        for name in args.categories.split(","):
            name = name.strip().lower()
            try:
                cat_set.add(PIICategory(name))
            except ValueError:
                print(f"Warning: Unknown category '{name}', skipping.", file=sys.stderr)
        if not cat_set:
            print("Error: No valid categories specified.", file=sys.stderr)
            sys.exit(1)

    try:
        result = sanitize_file(args.file, categories=cat_set)
    except FileNotFoundError as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)

    # Preview mode.
    # Raw PII text is intentionally NOT printed — only category, offset,
    # length, and placeholder are shown.  This prevents leaking sensitive
    # data into LLM context when the output is captured by an agent.
    if args.preview:
        if not result.has_pii:
            print("No PII detected.")
            return
        print(f"Detected {len(result.entities)} PII entities:\n")
        cat_counters: dict[str, int] = {}
        for i, ent in enumerate(result.entities, 1):
            cat_key = ent.category.value.upper()
            cat_counters[cat_key] = cat_counters.get(cat_key, 0) + 1
            placeholder = f"[{cat_key}_{cat_counters[cat_key]}]"
            print(f"  {i}. [{ent.category.value}] {placeholder} (offset {ent.start}:{ent.end}, length {ent.end - ent.start})")
        return

    # JSON mode.
    # Stdout JSON intentionally omits raw PII values (entities[].text
    # and entity_map) to prevent leaking sensitive data into LLM context
    # when the output is captured by an agent.
    if args.json_output:
        data: dict[str, object] = {
            "file": str(args.file),
            "has_pii": result.has_pii,
            "entity_count": len(result.entities),
            "summary": result.summary,
            "entities": [
                {
                    "category": e.category.value,
                    "start": e.start,
                    "end": e.end,
                }
                for e in result.entities
            ],
            "sanitized_text": result.sanitized_text,
        }

        # Write the entity map (contains raw PII) to a sidecar file when
        # --output is specified.  This keeps it on disk and out of stdout.
        if args.output:
            map_path = args.output.with_suffix(".entity-map.json")
            map_data = {
                "entity_map": result.entity_map,
                "entities": [
                    {
                        "category": e.category.value,
                        "text": e.text,
                        "start": e.start,
                        "end": e.end,
                    }
                    for e in result.entities
                ],
            }
            map_path.parent.mkdir(parents=True, exist_ok=True)
            map_path.write_text(json.dumps(map_data, indent=2), encoding="utf-8")
            data["entity_map_file"] = str(map_path)
            print(
                f"Entity map written to {map_path} (contains raw PII — do not share)",
                file=sys.stderr,
            )

        print(json.dumps(data, indent=2))
        return

    # Default: write sanitized text.
    if args.output:
        args.output.parent.mkdir(parents=True, exist_ok=True)
        args.output.write_text(result.sanitized_text, encoding="utf-8")
        print(
            f"Sanitized output written to {args.output}", file=sys.stderr,
        )
        if result.has_pii:
            print(
                f"  Redacted {len(result.entities)} entities "
                f"across {len({e.category for e in result.entities})} categories.",
                file=sys.stderr,
            )
    else:
        sys.stdout.write(result.sanitized_text)
        if not result.sanitized_text.endswith("\n"):
            sys.stdout.write("\n")


if __name__ == "__main__":
    main()

```

### references/SUPPORTED_PII.md

```markdown
# Supported PII Categories

## Financial

| Category | Description | Detection method | Example |
|---|---|---|---|
| `credit_card` | Credit/debit card numbers (Visa, Mastercard, Amex, etc.) | Regex + Luhn checksum | `4111 1111 1111 1111` |
| `cvv` | Card verification value (keyword-anchored) | Regex | `CVV: 123` |
| `expiry_date` | Card expiration date (keyword-anchored) | Regex | `exp: 01/30` |
| `bank_routing` | US ABA bank routing number (keyword-anchored) | Regex | `routing: 021000021` |

## Government IDs

| Category | Description | Detection method | Example |
|---|---|---|---|
| `ssn` | US Social Security Number (validated area/group/serial) | Regex | `123-45-6789` |
| `passport` | Passport number (keyword-anchored) | Regex | `Passport: AB1234567` |
| `drivers_license` | Driver's license number (keyword-anchored) | Regex | `DL: D12345678` |

## Credentials

| Category | Description | Detection method | Example |
|---|---|---|---|
| `api_key` | API keys from known providers | Regex (prefix match) | `sk-abc...`, `ghp_...`, `AKIA...` |

Supported providers: OpenAI/Anthropic (`sk-`, including `sk-proj-*` and `sk-svcacct-*`), GitHub (`ghp_`), Slack (`xoxb-`, `xoxp-`), AWS (`AKIA`).

## Healthcare / Professional

| Category | Description | Detection method | Example |
|---|---|---|---|
| `medical_license` | State medical license numbers (keyword-anchored) | Regex | `License: CA-MD-8827341` |
| `insurance_id` | Insurance member/policy IDs (keyword-anchored) | Regex | `Member ID: BCB-2847193` |

Keywords: license, medical license, member id, insurance id, subscriber id, policy number.

## Contact / Personal

| Category | Description | Detection method | Example |
|---|---|---|---|
| `email` | Email addresses | Regex | `[email protected]` |
| `phone` | Phone numbers (US and international) | Regex (7+ digits) | `+1 (555) 123-4567` |
| `ip_address` | IPv4 addresses | Regex | `192.168.1.100` |
| `date_of_birth` | Date of birth (keyword-anchored) | Regex | `DOB: 03/15/1985` |
| `address` | US mailing addresses (street + optional city/state/zip) | Regex | `742 Evergreen Terrace Dr, Springfield, IL 62704` |

## False positive mitigation

Several patterns use keyword-anchoring to reduce false positives:
- **CVV/expiry/DOB/passport/DL/routing/medical license/insurance ID**: Only matched when preceded by a keyword (e.g., "CVV:", "DOB:", "passport", "Member ID:").
- **SSN**: Area numbers 000, 666, and 900-999 are excluded per SSA rules. Group 00 and serial 0000 are excluded.
- **Credit card**: Luhn checksum validation eliminates random digit sequences.
- **Phone**: Requires 7+ actual digits to avoid matching short number sequences.
- **IP address**: Validates each octet is 0-255.

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "agentward-ai",
  "slug": "sanitize",
  "displayName": "AgentWard Sanitize",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1772073238724,
    "commit": "https://github.com/openclaw/skills/commit/761997cbef03946dae1c734d493b3310fd5f99eb"
  },
  "history": []
}

```

sanitize | SkillHub