SkillHub ClubResearch & OpsFull Stack

osint-investigator

Deep OSINT (Open Source Intelligence) investigations. Use when the user wants to research, find, or investigate any person, place, organisation, username, domain, IP address, phone number, image, vehicle, or object using publicly available information. Triggers on phrases like "find information on", "investigate", "look up", "who is", "trace this", "dig into", "OSINT search", "background check", or any request to gather open-source intelligence about a target. Performs deep multi-source analysis across web search, social media, DNS/WHOIS, image search, maps, public records, and more — returning a structured intelligence report.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,108

Hot score

Updated

March 20, 2026

Overall rating

C0.0

Composite score

0.0

Best-practice grade

B71.9

Install command

npx @skill-hub/cli install openclaw-skills-osint-investigator

Repository

openclaw/skills

Skill path: skills/cineglobe/osint-investigator

Open repository

Best for

Primary workflow: Research & Ops.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install osint-investigator into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding osint-investigator to shared team environments
Use osint-investigator for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: osint-investigator
description: Deep OSINT (Open Source Intelligence) investigations. Use when the user wants to research, find, or investigate any person, place, organisation, username, domain, IP address, phone number, image, vehicle, or object using publicly available information. Triggers on phrases like "find information on", "investigate", "look up", "who is", "trace this", "dig into", "OSINT search", "background check", or any request to gather open-source intelligence about a target. Performs deep multi-source analysis across web search, social media, DNS/WHOIS, image search, maps, public records, and more — returning a structured intelligence report.
---

# OSINT Investigator

Multi-source open-source intelligence gathering. Identify target type, run all applicable modules, then produce a structured report.

## Target Classification

Before running any module, classify the target:

- **Person** (real name, alias, face) → modules: social, web, image, username
- **Username / Handle** → modules: username, social, web  
- **Domain / Website** → modules: dns, whois, web, social
- **IP Address** → modules: ip, dns, web
- **Organisation / Company** → modules: web, social, dns, maps, corporate
- **Phone Number** → modules: phone, web, social
- **Email Address** → modules: email, web, social
- **Location / Address** → modules: maps, web, social, geo
- **Image / Photo** → modules: image, reverse
- **Object / Asset** → modules: web, image, social

Run ALL applicable modules in parallel. Never stop after one source.

## Module Playbook

### 🌐 Web Search (`web_search` tool)
Run at minimum 5–8 targeted queries per target. Vary operators:
```
"full name" site:linkedin.com
"username" -site:twitter.com
target filetype:pdf
target inurl:profile
"target" "email" OR "contact" OR "phone"
target site:reddit.com
target site:github.com
```
Follow top URLs with `web_fetch` to extract full content.

### 🔗 DNS / WHOIS
```bash
whois <domain>
dig <domain> ANY
dig <domain> MX
dig <domain> TXT
nslookup <domain>
host <domain>
```
Also fetch: `https://rdap.org/domain/<domain>` via `web_fetch`

### 🌍 IP Intelligence
```bash
curl -s https://ipinfo.io/<ip>/json
curl -s https://ip-api.com/json/<ip>
```
Also check: `https://www.shodan.io/host/<ip>` via `web_fetch`

### 📱 Username Search
Check all platforms via `web_fetch` (just check HTTP status + page title — don't need to load full content for existence checks):
- `https://github.com/<username>`
- `https://twitter.com/<username>`
- `https://instagram.com/<username>`
- `https://reddit.com/user/<username>`
- `https://tiktok.com/@<username>`
- `https://youtube.com/@<username>`
- `https://linkedin.com/in/<username>`
- `https://medium.com/@<username>`
- `https://pinterest.com/<username>`
- `https://twitch.tv/<username>`
- `https://steamcommunity.com/id/<username>`
- `https://keybase.io/<username>`
- `https://t.me/<username>` (Telegram)

### 🐦 Social Media Deep Dive
For each confirmed platform profile, use `web_fetch` to extract:
- Bio / description
- Profile photo URL
- Follower/following counts
- Join date
- Location (if listed)
- Links in bio
- Pinned posts / recent activity

For Twitter/X: also search `web_search` for `site:twitter.com "<target>"` and nitter mirrors.

### 🗺️ Maps & Location
```bash
# Use web_fetch or browser for:
# Google Maps search
https://maps.googleapis.com/maps/api/geocode/json?address=<address>&key=<key>
# Or use goplaces skill if available
# Streetview metadata check
https://maps.googleapis.com/maps/api/streetview/metadata?location=<lat,lng>&key=<key>
```
Also search: `web_search` for `"<target location>" site:maps.google.com OR site:wikimapia.org OR site:openstreetmap.org`

### 🖼️ Image Search & Reverse Image Search

**Finding images of a person (no image provided):**
1. Search for profile photos on all confirmed social profiles — extract direct image URLs from page source or og:image meta tags
2. Run `web_search` for `"<name>" site:linkedin.com` — LinkedIn og:image often returns profile photo URL directly
3. Check Gravatar: compute MD5 of likely email addresses → `https://www.gravatar.com/<md5>.json`
4. Search news/press: `web_search` for `"<name>" filetype:jpg OR filetype:png`
5. Use `web_fetch` to pull `og:image` from any confirmed profile pages

**Reverse image search (image URL or local file provided):**
```bash
# Direct URL-based reverse search (use web_fetch):
https://yandex.com/images/search?rpt=imageview&url=<image_url>
https://tineye.com/search?url=<image_url>

# Google Lens (requires browser tool):
https://lens.google.com/uploadbyurl?url=<image_url>

# For avatars and profile images — extract URL then feed into:
# 1. Yandex (best for face matching, indexes more than Google)
# 2. TinEye (exact match/copy detection)
# 3. Google Lens via browser tool
```

**EXIF / Metadata extraction (if file is available locally):**
```bash
exiftool <image>            # full metadata dump
exiftool -gps:all <image>   # GPS coordinates only
exiftool -DateTimeOriginal <image>  # when photo was taken
```
Online tools: `web_fetch https://www.metadata2go.com` or `https://www.pic2map.com`

**Photo geolocation (no EXIF GPS):**
- Street signs, shop names, vehicle plates → `web_search` to identify region
- Architecture / vegetation / road markings → narrow country/region
- Sun angle + shadow direction → `https://www.suncalc.org` to estimate time & location
- Cross-reference with Google Street View via `browser` tool

**When searching for a person by image from social media:**
1. `web_fetch` the profile page and look for `og:image` or `<img>` src in the rendered HTML
2. Extract the full CDN image URL
3. Feed to Yandex imageview and TinEye
4. Note: Instagram/Facebook CDN URLs expire — use Yandex cache or download first

### 📧 Email Intelligence
```bash
# Breach/exposure check
curl -s "https://haveibeenpwned.com/api/v3/breachedaccount/<email>" -H "hibp-api-key: <key>"
# Format validation + domain MX check
dig $(echo <email> | cut -d@ -f2) MX
# Gravatar (hashed MD5 of email)
curl -s "https://www.gravatar.com/<md5_hash>.json"
```
Also: `web_search` for `"<email>" site:pastebin.com OR site:ghostbin.com`

### 📞 Phone Intelligence
```bash
# Carrier / region lookup
curl -s "https://phonevalidation.abstractapi.com/v1/?api_key=<key>&phone=<number>"
```
Also: `web_search` for `"<phone_number>"` and check `site:truecaller.com`, `site:whitepages.com`

### 🏢 Corporate / Organisation
Use `web_fetch` on:
- `https://opencorporates.com/companies?q=<name>` 
- Companies House (UK): `https://find-and-update.company-information.service.gov.uk/search?q=<name>`
- LinkedIn company page: `https://linkedin.com/company/<slug>`
- Crunchbase: `web_search` for `site:crunchbase.com "<company>"`

### 📄 Document & Data Leaks
```bash
web_search queries:
"<target>" filetype:pdf OR filetype:xlsx OR filetype:docx
"<target>" site:pastebin.com
"<target>" site:github.com password OR secret OR key
"<target>" site:trello.com OR site:notion.so
```

### 🔍 Cache & Archive
```bash
# Wayback Machine
curl -s "https://archive.org/wayback/available?url=<url>"
web_fetch "https://web.archive.org/web/*/<url>" for snapshots
# Google Cache via web_search: cache:<url>
```

## Investigation Workflow

1. **Classify** the target type
2. **Plan** — list all modules to run
3. **Execute** all modules (parallelise where possible using multiple tool calls)
4. **Correlate** — cross-reference findings across sources, note consistencies and conflicts
5. **Report** — structured output (see below)

## Report Format

Always produce a structured report. Adapt sections to what was found:

```
# OSINT Report: <Target>
**Date:** <UTC timestamp>
**Target Type:** <classification>
**Query:** <original user request>

## Identity Summary
[Key identifying information — name, aliases, age, location, nationality]

## Online Presence
[Confirmed profiles with URLs, follower counts, activity level]

## Contact & Technical
[Email addresses, phone numbers, domains, IPs]

## Location Intelligence
[Known locations, addresses, coordinates, map links]

## Corporate / Organisational Links
[Companies, roles, affiliations]

## Historical Data
[Archived content, old usernames, past locations]

## Document & Data Exposure
[Public documents, paste sites, leak mentions]

## Image Intelligence
[Profile photos, reverse image results, photo metadata]

## Confidence & Gaps
[Confidence level per finding — High/Medium/Low; list gaps]

## Sources
[All URLs consulted]
```

## Configuration & Authentication

Config is stored at: `<skill_dir>/config/osint_config.json` (chmod 600, auto-created on first save).

The agent configures everything **conversationally** — no terminal script needed. When the user says they want to add credentials, configure PDF output, or set up an API key, follow the flow below.

### Conversational Config Flow

When the user wants to configure the skill, ask them questions directly in chat and write the answers to the config file yourself using the `write` tool.

**Step 1 — Ask what they want to configure:**
> "What would you like to set up? I can configure:
> - Platform credentials (Instagram, Twitter/X, LinkedIn, Facebook)
> - API keys (Google Maps, Shodan, HaveIBeenPwned, Hunter.io, AbstractAPI Phone)
> - PDF report output (on/off, save location)"

**Step 2 — Collect the values** (ask one platform at a time):
- For API keys: ask them to paste the key directly in chat
- For passwords: warn them the value will be stored in a local JSON file, then ask
- For output settings: ask yes/no / provide a path

**Step 3 — Write the config:**
```python
# Read existing config (or start fresh)
import json, os
cfg_path = "<skill_dir>/config/osint_config.json"
os.makedirs(os.path.dirname(cfg_path), exist_ok=True)
cfg = json.load(open(cfg_path)) if os.path.exists(cfg_path) else {"platforms": {}, "output": {}}

# Example: save Twitter bearer token
cfg["platforms"]["twitter"] = {"configured": True, "method": "api_key", "bearer_token": "<VALUE>"}

# Example: enable PDF
cfg["output"]["pdf_enabled"] = True
cfg["output"]["pdf_output_dir"] = "~/Desktop"

# Write back
with open(cfg_path, "w") as f:
    json.dump(cfg, f, indent=2)
os.chmod(cfg_path, 0o600)
```
Use the `write` tool directly — no need to run Python.

### Supported Platform Integrations

| Platform | Fields | What It Unlocks |
|----------|--------|-----------------|
| Instagram | `username`, `password` | Profile content behind login wall — **use a burner account** |
| Twitter/X | `bearer_token` (+ optional `api_key`, `api_secret`) | Full tweet/profile/search via API v2 (free tier works) |
| LinkedIn | `username` (email), `password` | Profile scraping — use sparingly, heavily rate-limited |
| Facebook | `email`, `password` | Public profile/group content |
| Google Maps | `api_key` | Geocoding, Place Search, Street View metadata |
| Shodan | `api_key` | Deep IP/host intelligence |
| HaveIBeenPwned | `api_key` | Email breach lookups ($3.95/mo at haveibeenpwned.com/API/Key) |
| Hunter.io | `api_key` | Email discovery by domain (free: 25 req/mo at hunter.io/api-keys) |
| AbstractAPI Phone | `api_key` | Phone carrier/region lookup (app.abstractapi.com/api/phone-validation) |

### Reading Credentials During a Search

```bash
# Read config and extract a value in one line:
BEARER=$(python3 -c "import json; c=json.load(open('<skill_dir>/config/osint_config.json')); print(c['platforms']['twitter']['bearer_token'])")

# Then use it:
curl -s -H "Authorization: Bearer $BEARER" \
  "https://api.twitter.com/2/users/by/username/<handle>?user.fields=description,location,created_at,public_metrics"
```

### Twitter/X API v2 (when configured)
```bash
# Profile lookup
curl -s -H "Authorization: Bearer $BEARER" \
  "https://api.twitter.com/2/users/by/username/<handle>?user.fields=description,location,created_at,public_metrics,entities"

# Recent tweets
curl -s -H "Authorization: Bearer $BEARER" \
  "https://api.twitter.com/2/users/<user_id>/tweets?max_results=10&tweet.fields=created_at,geo,entities"

# Search recent tweets
curl -s -H "Authorization: Bearer $BEARER" \
  "https://api.twitter.com/2/tweets/search/recent?query=<query>&max_results=10"
```

### Shodan API (when configured)
```bash
curl -s "https://api.shodan.io/shodan/host/<ip>?key=$SHODAN_KEY"
curl -s "https://api.shodan.io/dns/resolve?hostnames=<domain>&key=$SHODAN_KEY"
```

### Hunter.io API (when configured)
```bash
curl -s "https://api.hunter.io/v2/domain-search?domain=<domain>&api_key=$HUNTER_KEY"
curl -s "https://api.hunter.io/v2/email-verifier?email=<email>&api_key=$HUNTER_KEY"
```

### HaveIBeenPwned API (when configured)
```bash
curl -s "https://haveibeenpwned.com/api/v3/breachedaccount/<email>" \
  -H "hibp-api-key: $HIBP_KEY" -H "User-Agent: osint-investigator"
```

### Google Maps API (when configured)
```bash
curl -s "https://maps.googleapis.com/maps/api/geocode/json?address=<address>&key=$GMAPS_KEY"
curl -s "https://maps.googleapis.com/maps/api/place/textsearch/json?query=<query>&key=$GMAPS_KEY"
curl -s "https://maps.googleapis.com/maps/api/streetview/metadata?location=<lat,lng>&key=$GMAPS_KEY"
```

## PDF Report Generation

### Check if PDF is enabled
```bash
python3 -c "import json; c=json.load(open('<skill_dir>/config/osint_config.json')); print(c.get('output',{}).get('pdf_enabled', False))"
```

### Generate a PDF
Write the markdown report to a temp file, then run the shell wrapper (self-installs `fpdf2` if missing):
```bash
cat > /tmp/osint_report.md << 'ENDREPORT'
<full markdown report>
ENDREPORT

bash <skill_dir>/scripts/generate_pdf.sh \
  --input /tmp/osint_report.md \
  --target "Target Name" \
  --output ~/Desktop
```

The wrapper (`generate_pdf.sh`) will:
1. Check if `fpdf2` is installed — install it automatically if not
2. Call `generate_pdf.py` with the same arguments
3. Print the output path: `PDF saved: /path/to/OSINT_Name_20260225_1035.pdf`

**No setup needed by the user** — works on any machine with Python 3 + pip.

### PDF confidence colour coding
Confidence is detected automatically from the text of each section/paragraph/table row — just include the word in your report and the PDF will colour-code it:

- 🟢 **GREEN** `[HIGH]` — verified from multiple reliable sources
- 🟠 **ORANGE** `[MED]` — likely correct, single or unverified source  
- 🔴 **RED** `[LOW]` — possible match, little corroborating evidence
- ⚪ **GREY** `[UNVERIFIED]` — user-provided context, not independently confirmed

### Toggling PDF output via conversation
When the user says "turn on PDF reports" or "disable PDF output":
1. Read the config file
2. Update `cfg["output"]["pdf_enabled"]` to `true` or `false`
3. Write it back
4. Confirm to the user

## Ethics & Legality

- Only use **publicly available** data — never attempt to access private systems
- Do not aggregate data in ways designed to facilitate stalking or harassment
- Respect robots.txt in spirit; use cached/archive versions where direct scraping is blocked
- If the target is clearly a private individual being investigated without consent, flag this before proceeding
- Instagram/LinkedIn/Facebook credentials: always recommend a burner/alt account — never the user's personal accounts

## Reference Files

- `references/osint-sources.md` — curated OSINT databases, APIs, and search operators by category
- `references/social-platforms.md` — platform-specific extraction tips and URL patterns
- `scripts/generate_pdf.py` — PDF generator (requires fpdf2, auto-installed via shell wrapper)
- `scripts/generate_pdf.sh` — shell wrapper; self-installs fpdf2, then calls generate_pdf.py
- `config/osint_config.json` — live config (auto-created on first write, chmod 600)


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/osint-sources.md

```markdown
# OSINT Sources — Master Reference

## Search Engines & General
| Source | URL | Notes |
|--------|-----|-------|
| Google | `https://google.com` | Use `web_search` tool; operators: `site:`, `inurl:`, `filetype:`, `"exact"`, `-exclude` |
| Bing | `https://bing.com` | Indexes different content to Google |
| DuckDuckGo | `https://duckduckgo.com` | Less filtered results |
| Yandex | `https://yandex.com` | Excellent for Eastern European targets; superior reverse image |
| Startpage | `https://startpage.com` | Google proxy, no tracking |
| Wayback Machine | `https://web.archive.org` | Historical snapshots: `https://web.archive.org/web/*/<url>` |
| Cached pages | `cache:<url>` in Google | Snapshot of last crawl |

## Social Media
| Platform | Profile URL | Search URL |
|----------|-------------|------------|
| Twitter/X | `twitter.com/<handle>` | `twitter.com/search?q=<query>` |
| Instagram | `instagram.com/<handle>` | Use web_search: `site:instagram.com "<term>"` |
| Facebook | `facebook.com/<handle>` | Public pages/profiles only |
| LinkedIn | `linkedin.com/in/<handle>` | `linkedin.com/company/<slug>` for orgs |
| TikTok | `tiktok.com/@<handle>` | |
| Reddit | `reddit.com/user/<handle>` | `reddit.com/search?q=<query>` |
| YouTube | `youtube.com/@<handle>` | |
| Twitch | `twitch.tv/<handle>` | |
| GitHub | `github.com/<handle>` | Check repos, gists, commits for email addresses |
| Telegram | `t.me/<handle>` | Public channels/groups only |
| Pinterest | `pinterest.com/<handle>` | |
| Snapchat | `snapchat.com/add/<handle>` | Limited public data |
| Medium | `medium.com/@<handle>` | |
| Substack | `<handle>.substack.com` | |
| Mastodon | Federated — search `<handle>@<instance>` | |

## Username Search Aggregators
| Tool | URL |
|------|-----|
| Namechk | `https://namechk.com/<username>` |
| Knowem | `https://knowem.com/<username>` |
| Sherlock (if installed) | `sherlock <username>` |
| WhatsMyName | `https://whatsmyname.app` |

## Domain & DNS Intelligence
| Tool | Command / URL |
|------|---------------|
| WHOIS | `whois <domain>` |
| RDAP | `https://rdap.org/domain/<domain>` |
| Dig | `dig <domain> ANY/MX/TXT/NS` |
| DNSDumpster | `https://dnsdumpster.com` (web_fetch) |
| SecurityTrails | `https://securitytrails.com/domain/<domain>/dns` |
| Shodan | `https://www.shodan.io/search?query=<domain>` |
| BuiltWith | `https://builtwith.com/<domain>` — tech stack |
| Wappalyzer | Browser extension / `https://www.wappalyzer.com/lookup/<domain>` |
| crt.sh | `https://crt.sh/?q=<domain>` — SSL cert transparency |
| ViewDNS | `https://viewdns.info` — reverse IP, reverse whois |
| DomainTools | `https://whois.domaintools.com/<domain>` |

## IP Address Intelligence
| Tool | URL / Command |
|------|---------------|
| IPInfo | `https://ipinfo.io/<ip>/json` |
| IP-API | `https://ip-api.com/json/<ip>` |
| AbuseIPDB | `https://www.abuseipdb.com/check/<ip>` |
| Shodan | `https://www.shodan.io/host/<ip>` |
| GreyNoise | `https://viz.greynoise.io/ip/<ip>` |
| BGP.tools | `https://bgp.tools/prefix/<ip>` |
| IPVoid | `https://www.ipvoid.com/ip-blacklist-check/` |

## Email Intelligence
| Tool | URL / Command |
|------|---------------|
| HaveIBeenPwned | `https://haveibeenpwned.com/api/v3/breachedaccount/<email>` (needs API key) |
| Hunter.io | `https://hunter.io/email-finder` — find emails by domain |
| Gravatar | `https://www.gravatar.com/<MD5_of_email>.json` |
| EmailRep | `https://emailrep.io/<email>` |
| Holehe (if installed) | `holehe <email>` — checks account existence on 100+ sites |

## Phone Number Intelligence  
| Tool | URL |
|------|-----|
| Truecaller | `https://www.truecaller.com/search/us/<number>` |
| Sync.me | `https://sync.me/search/?number=<number>` |
| PhoneInfoga (if installed) | `phoneinfoga scan -n <number>` |
| AbstractAPI | `https://phonevalidation.abstractapi.com/v1/?phone=<number>` |
| NumVerify | `https://numverify.com/api/validate?number=<number>` |

## Image & Face Intelligence
| Tool | URL |
|------|-----|
| Google Images | `https://images.google.com` — use browser to upload |
| Yandex Images | `https://yandex.com/images/search?rpt=imageview&url=<url>` |
| TinEye | `https://tineye.com/search?url=<url>` |
| Bing Visual Search | `https://www.bing.com/visualsearch` |
| PimEyes (face) | `https://pimeyes.com` — face recognition (limited free) |
| FaceCheck.ID | `https://facecheck.id` |

## Maps & Geolocation
| Tool | URL |
|------|-----|
| Google Maps | `https://www.google.com/maps/search/<query>` |
| OpenStreetMap | `https://www.openstreetmap.org/search?query=<address>` |
| Google StreetView | `https://www.google.com/maps/@<lat>,<lng>,3a,75y,90t/data=...` |
| Wikimapia | `https://wikimapia.org/#lat=<lat>&lon=<lng>` |
| SunCalc | `https://www.suncalc.org` — verify photo time from sun angle |
| GeoHack | `https://geohack.toolforge.org/geohack.php?params=<lat>_N_<lng>_E` |

## Corporate / Company Records
| Tool | URL | Coverage |
|------|-----|----------|
| OpenCorporates | `https://opencorporates.com/companies?q=<name>` | Global |
| Companies House | `https://find-and-update.company-information.service.gov.uk/search?q=<name>` | UK |
| SEC EDGAR | `https://efts.sec.gov/LATEST/search-index?q=<name>` | US public companies |
| Crunchbase | `https://www.crunchbase.com/search/organizations/field/organizations/facet_ids/<query>` | Startups/VC |
| LinkedIn | `https://www.linkedin.com/company/<slug>` | |
| Pitchbook | Web search: `site:pitchbook.com "<company>"` | |

## Paste & Leak Sites
| Site | URL |
|------|-----|
| Pastebin | `https://pastebin.com/search?q=<query>` |
| GitHub Gists | `https://gist.github.com/search?q=<query>` |
| JustPaste.it | web_search: `site:justpaste.it "<target>"` |
| ControlC | web_search: `site:controlc.com "<target>"` |
| Rentry | web_search: `site:rentry.co "<target>"` |
| DeHashed | `https://dehashed.com/search?query=<email>` (paid, but check for public results) |

## Public Records (UK-focused for Finley)
| Source | URL |
|--------|-----|
| 192.com | `https://www.192.com/search/people/<name>` |
| BT Phone Book | `https://www.thephonebook.bt.com` |
| Electoral Roll | via 192.com or Tracesmart |
| UK Land Registry | `https://www.gov.uk/search-property-information-land-registry` |
| UK Court Records | `https://www.find-court-tribunal.service.gov.uk` |
| Companies House | `https://find-and-update.company-information.service.gov.uk` |

## Google Dorking Operators
```
site:          - restrict to domain
inurl:         - keyword in URL
intitle:       - keyword in page title  
filetype:      - specific file type (pdf, xlsx, docx, txt)
"exact phrase" - exact match
-keyword       - exclude keyword
OR             - either term
*              - wildcard
before:YYYY    - results before date
after:YYYY     - results after date
cache:         - Google's cached version
related:       - similar sites
```

### High-value dorks
```
"<target>" filetype:pdf              # documents mentioning target
"<target>" site:github.com           # code references
"<target>" site:pastebin.com         # paste leaks
"<target>" "password" OR "passwd"    # credential exposure
"<target>" "email" filetype:xlsx     # spreadsheet leaks
"<target>" inurl:admin OR inurl:login # admin panels
"<name>" "@gmail.com" OR "@yahoo.com" # email discovery
```

```

### references/social-platforms.md

```markdown
# Social Platform Extraction Guide

Tips for extracting maximum data from each platform without authentication.

## Twitter / X

**Profile URL:** `https://twitter.com/<handle>` or `https://x.com/<handle>`

**What to extract:**
- Display name, bio, location field, website link
- Join date (visible on profile)
- Tweet count, followers, following
- Pinned tweet content
- Profile and banner image URLs

**Nitter mirrors (no login required):**
- `https://nitter.net/<handle>`
- `https://nitter.cz/<handle>`
- `https://nitter.privacydev.net/<handle>`

**Search tricks:**
```
site:twitter.com "<name>" → find mentions
from:<handle> → their tweets in Google
to:<handle> → replies to them
```

**Direct tweet search:**
`https://twitter.com/search?q="<query>"&f=live`

---

## Instagram

**Profile URL:** `https://instagram.com/<handle>`

**Public data (no login):**
- Bio, website link, follower counts (partially)
- Post thumbnails visible without login

**Extract via web_fetch:**
`https://www.instagram.com/<handle>/?__a=1` (may require headers)

**Search trick:** `site:instagram.com "<target name>"` in web_search

---

## Reddit

**Profile URL:** `https://reddit.com/user/<handle>`

**What to extract:**
- Account age (karma page shows)
- Post/comment history: `https://reddit.com/user/<handle>/comments`
- Subreddits active in (reveals interests, location clues)
- Pushshift (archived): `https://api.pushshift.io/reddit/search/comment/?author=<handle>`

**Search:** `site:reddit.com/user/<handle>` or `site:reddit.com "<target>"`

---

## LinkedIn

**Profile URL:** `https://linkedin.com/in/<handle>`

**Public data:**
- Name, headline, location
- Current/past employers and roles
- Education
- Skills, endorsements
- Connection count tier (500+, etc.)

**Company search:** `https://linkedin.com/company/<slug>`

**Google dorks:**
```
site:linkedin.com/in "<full name>"
site:linkedin.com/in "<name>" "<company>"
```

---

## GitHub

**Profile URL:** `https://github.com/<handle>`

**What to extract:**
- Real name, bio, company, location, website, Twitter link
- Organisations member of
- Public repos (check README, commits for email leaks)
- Gists: `https://gist.github.com/<handle>`
- Email from commits: `https://api.github.com/users/<handle>/events/public`

**Email from commit:**
```bash
curl -s https://api.github.com/users/<handle>/events/public | python3 -c "
import sys, json
for e in json.load(sys.stdin):
    p = e.get('payload', {})
    for c in p.get('commits', []):
        a = c.get('author', {})
        if a.get('email') and 'noreply' not in a['email']:
            print(a['name'], '-', a['email'])
" 2>/dev/null | sort -u
```

---

## TikTok

**Profile URL:** `https://tiktok.com/@<handle>`

**What to extract:**
- Bio, follower/following/likes counts
- Links in bio
- Video descriptions, hashtags used → reveals interests
- Comments mentioning location

**Search:** `site:tiktok.com "@<handle>"` or `site:tiktok.com "<name>"`

---

## YouTube

**Profile URL:** `https://youtube.com/@<handle>` or `https://youtube.com/channel/<id>`

**What to extract:**
- About page: description, links, join date, view count
- Channel ID (useful for other lookups)
- Playlist names (reveals interests/content themes)

**About page direct:** `https://www.youtube.com/@<handle>/about`

---

## Facebook

**Profile URL:** `https://facebook.com/<handle>` or `https://facebook.com/<numeric_id>`

**Public data (no login, limited):**
- Name, profile photo, cover photo
- Public posts only
- Workplace, education if set to public

**Graph search (limited now):** `https://www.facebook.com/search/top?q=<query>`

**Archive check:** Wayback Machine on `facebook.com/<handle>`

---

## Telegram

**Public channels/groups only:** `https://t.me/<handle>`

**What to extract from public channels:**
- Channel description, member count, post history

**Telegram search tools:**
- `https://tgstat.com/en/search?q=<query>` — channel analytics
- `https://telemetr.io/en` — channel discovery

---

## Discord

Limited public data. Check:
- `disboard.org` for public server listings
- `discord.me` for public server directories
- web_search: `"discord.gg" "<target>"`

---

## Twitch

**Profile URL:** `https://twitch.tv/<handle>`

**API (no key needed):**
```bash
curl -s "https://api.twitch.tv/helix/users?login=<handle>" \
  -H "Client-Id: <public_client_id>"
```

**What to extract:** Bio, stream category, follower count, creation date, connected socials in panels.

---

## Steam

**Profile URL:** `https://steamcommunity.com/id/<handle>` or `/profiles/<steamid64>`

**API (no key needed for public):**
`https://steamcommunity.com/id/<handle>?xml=1`

**SteamDB:** `https://www.steamdb.info/calculator/?player=<handle>`

---

## Image Extraction Tips

### Profile Photo Reverse Search
For any platform profile image:
1. Right-click → copy image URL
2. Feed to: `https://yandex.com/images/search?rpt=imageview&url=<url>`
3. And: `https://tineye.com/search?url=<url>`

### Photo Metadata (EXIF)
If you have the actual image file:
```bash
exiftool <image>          # full metadata
exiftool -gps:all <image> # GPS only
```

Online: `https://www.metadata2go.com` or `https://www.pic2map.com`

### Photo Geolocation Clues
If no EXIF GPS data, analyse visually:
- Street signs, license plates → `web_search` country/region of plate format
- Architecture style, vegetation
- Sun angle → SunCalc.org for time estimation
- Google Street View matching

```

### scripts/generate_pdf.py

```python
#!/usr/bin/env python3
"""
OSINT Investigator — PDF Report Generator
Converts a structured OSINT report (markdown or dict) into a clean PDF.

Usage (from agent):
    python3 generate_pdf.py --title "Jack Gooding" --input report.md --output ~/Desktop/

Usage (direct):
    python3 generate_pdf.py --title "Target Name" --markdown "..." --output /path/to/dir
"""

import argparse
import json
import os
import re
import sys
from datetime import datetime

try:
    from fpdf import FPDF
except ImportError:
    print("ERROR: fpdf2 not installed. Run: pip3 install fpdf2 --break-system-packages")
    sys.exit(1)

# Confidence level styling
CONFIDENCE = {
    "high":   {"emoji": "🟢", "label": "HIGH",   "r": 34,  "g": 139, "b": 34},
    "medium": {"emoji": "🟠", "label": "MED",    "r": 255, "g": 140, "b": 0},
    "low":    {"emoji": "🔴", "label": "LOW",    "r": 204, "g": 0,   "b": 0},
    "unverified": {"emoji": "⚪", "label": "UNVERIFIED", "r": 128, "g": 128, "b": 128},
}

CONFIG_PATH = os.path.join(os.path.dirname(__file__), '..', 'config', 'osint_config.json')


def load_config():
    path = os.path.normpath(CONFIG_PATH)
    if os.path.exists(path):
        with open(path) as f:
            return json.load(f)
    return {"output": {"pdf_enabled": True, "pdf_include_sources": True}}


def sanitize(text):
    """Replace unicode chars that fpdf latin-1 can't handle."""
    replacements = {
        "\u2019": "'", "\u2018": "'", "\u201c": '"', "\u201d": '"',
        "\u2013": "-", "\u2014": "--", "\u2022": "*", "\u2026": "...",
        "\u00a0": " ", "\u2713": "✓", "\u2715": "x",
        # confidence emojis → text
        "🟢": "[HIGH]", "🟠": "[MED]", "🔴": "[LOW]", "⚪": "[?]",
        "✅": "[OK]", "❌": "[NO]", "⚠️": "[WARN]", "🔍": "",
        "🏏": "", "💻": "", "📱": "", "🐦": "", "🏢": "", "📄": "",
        "🔗": "", "🌐": "", "📞": "", "📧": "", "🗺️": "", "🖼️": "",
        "📋": "", "🎯": "",
    }
    for src, dst in replacements.items():
        text = text.replace(src, dst)
    # Strip any remaining non-latin1
    return text.encode('latin-1', errors='replace').decode('latin-1')


class OSINTReport(FPDF):
    def __init__(self, title, target, date_str):
        super().__init__()
        self.report_title = title
        self.target = target
        self.date_str = date_str
        self.set_auto_page_break(auto=True, margin=15)

    def header(self):
        self.set_fill_color(20, 20, 40)
        self.rect(0, 0, 210, 18, 'F')
        self.set_font("Helvetica", "B", 11)
        self.set_text_color(255, 255, 255)
        self.set_xy(10, 4)
        self.cell(0, 10, sanitize(f"OSINT REPORT  |  {self.target}  |  {self.date_str}"), ln=False)
        self.set_text_color(0, 0, 0)
        self.ln(18)

    def footer(self):
        self.set_y(-12)
        self.set_font("Helvetica", "I", 8)
        self.set_text_color(150, 150, 150)
        self.cell(0, 10, f"Page {self.page_no()} | Generated by OSINT Investigator | CONFIDENTIAL", align="C")
        self.set_text_color(0, 0, 0)

    def cover_page(self, query, target_type, summary=None):
        self.add_page()
        # Dark header block
        self.set_fill_color(20, 20, 40)
        self.rect(0, 0, 210, 70, 'F')

        self.set_font("Helvetica", "B", 28)
        self.set_text_color(255, 255, 255)
        self.set_xy(15, 15)
        self.cell(0, 12, "OSINT REPORT", ln=True)

        self.set_font("Helvetica", "B", 18)
        self.set_text_color(100, 200, 255)
        self.set_x(15)
        self.cell(0, 10, sanitize(self.target), ln=True)

        self.set_font("Helvetica", "", 10)
        self.set_text_color(180, 180, 220)
        self.set_x(15)
        self.cell(0, 8, f"Generated: {self.date_str}", ln=True)

        self.set_text_color(0, 0, 0)
        self.set_y(80)

        # Meta boxes
        self.set_font("Helvetica", "B", 10)
        self.set_fill_color(240, 242, 248)
        self.set_x(15)
        self.cell(55, 8, "Target:", fill=True)
        self.set_font("Helvetica", "", 10)
        self.cell(0, 8, sanitize(self.target), ln=True)

        self.set_font("Helvetica", "B", 10)
        self.set_fill_color(240, 242, 248)
        self.set_x(15)
        self.cell(55, 8, "Target Type:", fill=True)
        self.set_font("Helvetica", "", 10)
        self.cell(0, 8, sanitize(target_type), ln=True)

        self.set_font("Helvetica", "B", 10)
        self.set_fill_color(240, 242, 248)
        self.set_x(15)
        self.cell(55, 8, "Query:", fill=True)
        self.set_font("Helvetica", "", 10)
        self.cell(0, 8, sanitize(query[:80]), ln=True)

        # Confidence legend
        self.ln(6)
        self.set_font("Helvetica", "B", 10)
        self.set_x(15)
        self.cell(0, 7, "Confidence Legend:", ln=True)

        legend = [
            ("high",   "Green  [HIGH]",       "Verified from multiple reliable sources"),
            ("medium", "Orange [MED]",        "Likely correct, single source or unverified"),
            ("low",    "Red    [LOW]",        "Possible match, little corroborating evidence"),
            ("unverified", "Grey  [UNVERIFIED]", "User-provided context, not independently confirmed"),
        ]
        for conf_key, label, desc in legend:
            conf = CONFIDENCE[conf_key]
            self.set_font("Helvetica", "B", 9)
            self.set_text_color(conf['r'], conf['g'], conf['b'])
            self.set_x(20)
            self.cell(38, 6, label)
            self.set_font("Helvetica", "", 9)
            self.set_text_color(80, 80, 80)
            self.cell(0, 6, sanitize(desc), ln=True)
        self.set_text_color(0, 0, 0)

        if summary:
            self.ln(4)
            self.set_x(15)
            self.set_font("Helvetica", "B", 10)
            self.set_fill_color(230, 240, 255)
            self.cell(180, 7, "Executive Summary", fill=True, ln=True)
            self.set_font("Helvetica", "", 9)
            self.set_x(15)
            self.multi_cell(180, 5, sanitize(summary))

    def section_header(self, title):
        self.ln(4)
        self.set_fill_color(20, 20, 40)
        self.set_text_color(255, 255, 255)
        self.set_font("Helvetica", "B", 11)
        self.set_x(10)
        self.cell(190, 8, sanitize(f"  {title}"), fill=True, ln=True)
        self.set_text_color(0, 0, 0)
        self.ln(1)

    def finding_row(self, finding, source, confidence_key):
        """Render a single finding row with confidence badge."""
        conf = CONFIDENCE.get(confidence_key.lower(), CONFIDENCE["low"])
        y = self.get_y()

        # Confidence badge
        self.set_fill_color(conf['r'], conf['g'], conf['b'])
        self.set_text_color(255, 255, 255)
        self.set_font("Helvetica", "B", 7)
        self.set_x(10)
        self.cell(22, 6, conf['label'], fill=True, align="C")

        # Finding text
        self.set_text_color(0, 0, 0)
        self.set_font("Helvetica", "", 9)
        self.set_x(35)
        self.cell(105, 6, sanitize(finding[:90]))

        # Source
        self.set_font("Helvetica", "I", 7)
        self.set_text_color(100, 100, 100)
        self.cell(0, 6, sanitize(source[:45]), ln=True)
        self.set_text_color(0, 0, 0)

        # Light separator line
        self.set_draw_color(220, 220, 220)
        self.line(10, self.get_y(), 200, self.get_y())

    def add_text_block(self, heading, body, confidence_key=None):
        """Add a named text block, optionally with a confidence badge."""
        if heading:
            self.set_font("Helvetica", "B", 9)
            self.set_x(12)
            if confidence_key:
                conf = CONFIDENCE.get(confidence_key.lower(), CONFIDENCE["low"])
                self.set_text_color(conf['r'], conf['g'], conf['b'])
                self.cell(25, 5, f"[{conf['label']}]")
                self.set_text_color(30, 30, 30)
                self.cell(0, 5, sanitize(heading), ln=True)
            else:
                self.set_text_color(30, 30, 30)
                self.cell(0, 5, sanitize(heading), ln=True)

        if body:
            self.set_font("Helvetica", "", 9)
            self.set_text_color(60, 60, 60)
            self.set_x(15)
            self.multi_cell(180, 4.5, sanitize(body))
        self.set_text_color(0, 0, 0)
        self.ln(1)

    def sources_section(self, sources):
        self.section_header("Sources")
        self.set_font("Helvetica", "", 8)
        self.set_text_color(40, 40, 120)
        for i, src in enumerate(sources, 1):
            self.set_x(12)
            self.cell(8, 5, f"{i}.")
            self.multi_cell(178, 5, sanitize(src))
        self.set_text_color(0, 0, 0)


def parse_markdown_report(md_text):
    """Parse a markdown OSINT report into structured sections."""
    sections = []
    current_section = None
    current_body = []
    sources = []
    in_sources = False
    meta = {"target": "", "date": "", "query": "", "target_type": "", "summary": ""}

    lines = md_text.split('\n')
    for line in lines:
        # Extract meta from header lines
        if line.startswith("**Date:**"):
            meta["date"] = re.sub(r'\*\*Date:\*\*\s*', '', line).strip()
        elif line.startswith("**Target Type:**"):
            meta["target_type"] = re.sub(r'\*\*Target Type:\*\*\s*', '', line).strip()
        elif line.startswith("**Query:**"):
            meta["query"] = re.sub(r'\*\*Query:\*\*\s*', '', line).strip()

        # Top level H1 = report title / target
        elif line.startswith("# OSINT Report:"):
            meta["target"] = line.replace("# OSINT Report:", "").strip()

        # H2 sections
        elif line.startswith("## "):
            if current_section:
                sections.append((current_section, '\n'.join(current_body).strip()))
            current_section = line[3:].strip()
            current_body = []
            in_sources = "Sources" in current_section

        # H3 subsections — treat as bold heading in body
        elif line.startswith("### "):
            current_body.append(f"\n{line[4:].strip()}:")

        # Source lines
        elif in_sources and line.strip().startswith("- http"):
            sources.append(line.strip()[2:])
        elif in_sources and re.match(r'https?://', line.strip()):
            sources.append(line.strip())

        else:
            current_body.append(line)

    if current_section:
        sections.append((current_section, '\n'.join(current_body).strip()))

    return meta, sections, sources


def confidence_from_text(text):
    """Detect confidence level from text containing HIGH/MEDIUM/LOW/UNVERIFIED."""
    text_upper = text.upper()
    if "HIGH" in text_upper:
        return "high"
    elif "MEDIUM" in text_upper or "MED" in text_upper:
        return "medium"
    elif "LOW" in text_upper:
        return "low"
    elif "UNVERIFIED" in text_upper:
        return "unverified"
    return None


def generate_pdf(markdown_text, output_dir="~/Desktop", target_override=None):
    """Main entry point: convert markdown report to PDF."""
    cfg = load_config()
    output_dir = os.path.expanduser(output_dir or cfg.get("output", {}).get("pdf_output_dir", "~/Desktop"))
    include_sources = cfg.get("output", {}).get("pdf_include_sources", True)

    meta, sections, sources = parse_markdown_report(markdown_text)
    target = target_override or meta.get("target", "Unknown Target")
    date_str = meta.get("date") or datetime.utcnow().strftime("%Y-%m-%d %H:%M UTC")

    pdf = OSINTReport(
        title=f"OSINT Report - {target}",
        target=target,
        date_str=date_str,
    )

    # Cover page
    summary_text = ""
    for sec_name, sec_body in sections:
        if "summary" in sec_name.lower() or "identity" in sec_name.lower():
            summary_text = sec_body[:400]
            break

    pdf.cover_page(
        query=meta.get("query", f"OSINT investigation: {target}"),
        target_type=meta.get("target_type", "Person"),
        summary=summary_text if summary_text else None,
    )

    # Content pages
    pdf.add_page()

    for sec_name, sec_body in sections:
        if not sec_body.strip():
            continue
        # Skip cover meta sections and sources (handled separately)
        if any(x in sec_name.lower() for x in ["sources", "confidence & gaps"]):
            continue

        pdf.section_header(sec_name)

        # Check if section has a markdown table (findings table)
        if "|" in sec_body and "---" in sec_body:
            # Parse table rows as finding rows
            rows = [r for r in sec_body.split('\n') if '|' in r and '---' not in r]
            # First row is header
            for row in rows[1:]:  # skip header row
                cells = [c.strip() for c in row.split('|') if c.strip()]
                if len(cells) >= 3:
                    finding = cells[0]
                    source = cells[1] if len(cells) > 1 else ""
                    conf_text = cells[2] if len(cells) > 2 else ""
                    conf_key = confidence_from_text(conf_text) or "low"
                    pdf.finding_row(finding, source, conf_key)
        else:
            # Render as text blocks, detect inline confidence markers
            paragraphs = re.split(r'\n{2,}', sec_body)
            for para in paragraphs:
                para = para.strip()
                if not para:
                    continue
                conf_key = confidence_from_text(para)
                # Clean markdown formatting
                para_clean = re.sub(r'\*\*(.*?)\*\*', r'\1', para)
                para_clean = re.sub(r'\*(.*?)\*', r'\1', para_clean)
                para_clean = re.sub(r'`(.*?)`', r'\1', para_clean)
                para_clean = re.sub(r'\[([^\]]+)\]\([^\)]+\)', r'\1', para_clean)
                para_clean = re.sub(r'^[-*]\s+', '', para_clean, flags=re.MULTILINE)
                pdf.add_text_block(None, para_clean, confidence_key=conf_key)

    # Confidence summary section (if present)
    for sec_name, sec_body in sections:
        if "confidence" in sec_name.lower():
            pdf.section_header("Confidence Summary")
            if "|" in sec_body and "---" in sec_body:
                rows = [r for r in sec_body.split('\n') if '|' in r and '---' not in r]
                for row in rows[1:]:
                    cells = [c.strip() for c in row.split('|') if c.strip()]
                    if len(cells) >= 3:
                        finding = cells[0]
                        source = cells[1] if len(cells) > 1 else ""
                        conf_text = cells[2] if len(cells) > 2 else ""
                        conf_key = confidence_from_text(conf_text) or "low"
                        pdf.finding_row(finding, source, conf_key)
            break

    # Sources
    if include_sources and sources:
        pdf.add_page()
        pdf.sources_section(sources)

    # Save
    os.makedirs(output_dir, exist_ok=True)
    safe_target = re.sub(r'[^a-zA-Z0-9_\- ]', '', target).strip().replace(' ', '_')
    timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M")
    filename = f"OSINT_{safe_target}_{timestamp}.pdf"
    filepath = os.path.join(output_dir, filename)
    pdf.output(filepath)

    return filepath


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Generate OSINT PDF report")
    parser.add_argument("--input", help="Path to markdown report file")
    parser.add_argument("--markdown", help="Markdown text directly")
    parser.add_argument("--target", help="Override target name")
    parser.add_argument("--output", default="~/Desktop", help="Output directory")
    args = parser.parse_args()

    md = ""
    if args.input:
        with open(args.input) as f:
            md = f.read()
    elif args.markdown:
        md = args.markdown
    else:
        print("Reading markdown from stdin...")
        md = sys.stdin.read()

    filepath = generate_pdf(md, output_dir=args.output, target_override=args.target)
    print(f"PDF saved: {filepath}")

```

### scripts/generate_pdf.sh

```bash
#!/usr/bin/env bash
# OSINT Investigator — PDF Report Generator wrapper
# Self-installs fpdf2 if missing, then runs generate_pdf.py
# Usage: bash generate_pdf.sh --input report.md --target "Name" --output ~/Desktop

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# Ensure fpdf2 is available
if ! python3 -c "import fpdf" 2>/dev/null; then
  echo "📦 Installing fpdf2..."
  pip3 install fpdf2 -q --break-system-packages 2>/dev/null \
    || pip3 install fpdf2 -q \
    || pip3 install fpdf2 -q --user
fi

# Verify install succeeded
if ! python3 -c "import fpdf" 2>/dev/null; then
  echo "❌ ERROR: Could not install fpdf2. Try manually: pip3 install fpdf2"
  exit 1
fi

python3 "$SCRIPT_DIR/generate_pdf.py" "$@"

```

### config/osint_config.json

```json
{
  "platforms": {},
  "output": {
    "pdf_enabled": false,
    "pdf_output_dir": "~/Desktop",
    "pdf_open_after": true,
    "pdf_include_sources": true,
    "pdf_include_raw_data": false
  }
}

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "cineglobe",
  "slug": "osint-investigator",
  "displayName": "OSINT Investigator",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1772016218812,
    "commit": "https://github.com/openclaw/skills/commit/bdedccf32d52e9c0725b26f9b4216213ff595310"
  },
  "history": []
}

```