sanitize-git-repo
Guide for sanitizing git repositories by identifying and replacing sensitive information such as API keys, tokens, and credentials. This skill should be used when tasks involve removing secrets from codebases, sanitizing repositories before sharing, or replacing sensitive values with placeholders. Applies to tasks involving secret detection, credential removal, or repository cleanup for security purposes.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install benchflow-ai-skillsbench-sanitize-git-repo
Repository
Skill path: registry/terminal_bench_2.0/letta_skills_batch/terminal_bench_2_0_sanitize-git-repo/environment/skills/sanitize-git-repo
Guide for sanitizing git repositories by identifying and replacing sensitive information such as API keys, tokens, and credentials. This skill should be used when tasks involve removing secrets from codebases, sanitizing repositories before sharing, or replacing sensitive values with placeholders. Applies to tasks involving secret detection, credential removal, or repository cleanup for security purposes.
Open repositoryBest for
Primary workflow: Run DevOps.
Technical facets: Full Stack, Backend, Security.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: benchflow-ai.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install sanitize-git-repo into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/benchflow-ai/SkillsBench before adding sanitize-git-repo to shared team environments
- Use sanitize-git-repo for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: sanitize-git-repo
description: Guide for sanitizing git repositories by identifying and replacing sensitive information such as API keys, tokens, and credentials. This skill should be used when tasks involve removing secrets from codebases, sanitizing repositories before sharing, or replacing sensitive values with placeholders. Applies to tasks involving secret detection, credential removal, or repository cleanup for security purposes.
---
# Sanitize Git Repository
This skill provides guidance for systematically identifying and replacing sensitive information in git repositories, including API keys, tokens, passwords, and other credentials.
## When to Use This Skill
- Sanitizing a repository before sharing or open-sourcing
- Removing accidentally committed secrets from a codebase
- Replacing hardcoded credentials with placeholders
- Auditing a repository for sensitive information
- Preparing code for security review
## Recommended Approach
### Phase 1: Comprehensive Discovery
Build a complete inventory of all sensitive values before making any changes. This prevents the common mistake of discovering additional secrets mid-process.
**Common Secret Patterns to Search:**
| Type | Pattern/Prefix | Example |
|------|----------------|---------|
| AWS Access Keys | `AKIA[A-Z0-9]{16}` | `AKIAIOSFODNN7EXAMPLE` |
| AWS Secret Keys | 40-char base64 strings | Often near access keys |
| GitHub Tokens | `ghp_`, `gho_`, `ghs_`, `ghr_` | `ghp_xxxxxxxxxxxx` |
| Huggingface Tokens | `hf_` | `hf_xxxxxxxxxxxx` |
| Generic API Keys | `api[_-]?key`, `apikey` | Varies |
| Bearer Tokens | `bearer`, `token` | Varies |
| Private Keys | `-----BEGIN.*PRIVATE KEY-----` | RSA/EC keys |
| Database URLs | `postgres://`, `mysql://`, `mongodb://` | Connection strings |
| Password Fields | `password`, `passwd`, `pwd` | Varies |
**Discovery Strategy:**
1. Search for known prefixes and patterns using regex
2. Search for common variable names (`API_KEY`, `SECRET`, `TOKEN`, `PASSWORD`, `CREDENTIAL`)
3. Check configuration files (`.env`, `config.*`, `settings.*`, `*.json`, `*.yaml`, `*.yml`)
4. Examine test fixtures and mock data files
5. Check embedded data within other formats (JSON encoded in strings, base64 encoded content)
**Important Locations to Check:**
- Configuration files and environment templates
- Test fixtures and sample data
- Documentation and README files
- CI/CD configuration files
- Docker and deployment configurations
- JSON/YAML data files (secrets may be embedded in structured data)
- Git diff outputs or patch files stored in the repository
### Phase 2: Inventory Documentation
Before making changes, create a documented list of:
1. Each unique sensitive value found
2. All file locations where each value appears
3. The exact string to match (including surrounding context if needed for uniqueness)
4. The placeholder to use for replacement
**Placeholder Conventions:**
- Use descriptive placeholders that indicate the type: `<AWS_ACCESS_KEY>`, `<GITHUB_TOKEN>`, `<DATABASE_PASSWORD>`
- Maintain consistency across all replacements
- Match the format specified by the user if provided
### Phase 3: Systematic Replacement
Execute replacements methodically:
1. Work through the inventory one secret at a time
2. For each secret, replace ALL occurrences across ALL files
3. Verify each replacement immediately after making it
4. Use exact string matching to avoid unintended modifications
**Replacement Best Practices:**
- Read files before editing to ensure exact string matching
- Handle whitespace and formatting precisely
- Consider secrets embedded in JSON or other structured formats (may require escaping)
- Use batch replacements when the same value appears multiple times in one file
### Phase 4: Verification
After all replacements:
1. Re-run all original discovery searches to confirm no secrets remain
2. Search for partial matches of sensitive values
3. Run any provided test suites to validate the sanitization
4. Check that placeholders are properly formatted
## Common Pitfalls
### 1. Incomplete Initial Discovery
**Problem:** Secrets discovered incrementally during the replacement process, requiring backtracking.
**Prevention:** Invest time upfront in comprehensive searching using multiple patterns and checking all file types, including data files and embedded content.
### 2. Secrets in Unexpected Locations
**Problem:** Secrets appear in JSON files, embedded strings, test fixtures, or encoded data that aren't found by simple searches.
**Prevention:** Search recursively through all file types. Check JSON and YAML files specifically. Look for base64-encoded content that might contain secrets.
### 3. Exact String Matching Failures
**Problem:** Edit operations fail due to whitespace differences or character encoding issues.
**Prevention:** Always read the target file first to understand exact formatting. Copy strings exactly as they appear in the file, including whitespace.
### 4. Missing Occurrences
**Problem:** Multiple occurrences of the same secret exist, but only some are replaced.
**Prevention:** After each replacement, immediately verify by searching for the original value again. Use replace-all functionality when available.
### 5. Git History Considerations
**Problem:** Secrets remain in git history even after being removed from current files.
**Prevention:** Clarify with the user whether git history sanitization is required. If so, tools like `git filter-branch` or `BFG Repo-Cleaner` may be needed (this is a separate, more complex operation).
### 6. Tool Availability Assumptions
**Problem:** Specialized search tools (like ripgrep) may not be available in all environments.
**Prevention:** Have fallback approaches ready. Standard `grep -r` works in most environments. Test tool availability before relying on it.
## Verification Checklist
Before considering the task complete:
- [ ] All known secret patterns have been searched for
- [ ] Configuration files have been examined
- [ ] JSON/YAML data files have been checked
- [ ] Test fixtures and mock data have been reviewed
- [ ] Each identified secret has been replaced in all locations
- [ ] Verification searches confirm no original secrets remain
- [ ] Placeholders follow a consistent format
- [ ] Any provided tests pass successfully
## Search Commands Reference
Standard grep (widely available):
```bash
# Search for AWS access keys
grep -rn "AKIA[A-Z0-9]\{16\}" . --exclude-dir=.git
# Search for common token prefixes
grep -rn "ghp_\|gho_\|ghs_\|hf_" . --exclude-dir=.git
# Search for password-related strings
grep -rn -i "password\|passwd\|pwd" . --exclude-dir=.git
# Search for API key patterns
grep -rn -i "api[_-]\?key\|apikey" . --exclude-dir=.git
```
Always exclude the `.git` directory from searches to avoid noise from git internals.