Back to skills
SkillHub ClubAnalyze Data & AIFull StackData / AI

csv-cleanroom

Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,095
Hot score
99
Updated
March 20, 2026
Overall rating
C0.0
Composite score
0.0
Best-practice grade
B77.6

Install command

npx @skill-hub/cli install openclaw-skills-csv-cleanroom

Repository

openclaw/skills

Skill path: skills/52yuanchangxing/csv-cleanroom

Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install csv-cleanroom into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding csv-cleanroom to shared team environments
  • Use csv-cleanroom for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: csv-cleanroom
description: Profile messy CSV files, standardize columns, detect data quality issues,
  and produce a reproducible cleanup plan.
version: 1.1.0
metadata:
  openclaw:
    requires:
      bins:
      - python3
    emoji: 🧰
---

# CSV Cleanroom

## Purpose

Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.

## Trigger phrases

- 清洗 CSV
- profile this dataset
- 数据质量检查
- 列名规范化
- build a cleanup plan

## Ask for these inputs

- CSV file or schema
- target schema if available
- known bad values
- dedupe rules
- date/currency locale

## Workflow

1. Profile the CSV: row count, nulls, duplicates, type mismatches, and outliers.
2. Normalize headers and map to the target schema.
3. Generate a step-by-step cleanup plan and optional transformed output.
4. Document irreversible operations before applying them.
5. Return a quality score and remediation checklist.

## Output contract

- profile report
- normalized schema
- cleanup plan
- quality scorecard

## Files in this skill

- Script: `{baseDir}/scripts/csv_cleanroom.py`
- Resource: `{baseDir}/resources/data_quality_checklist.md`

## Operating rules

- Be concrete and action-oriented.
- Prefer preview / draft / simulation mode before destructive changes.
- If information is missing, ask only for the minimum needed to proceed.
- Never fabricate metrics, legal certainty, receipts, credentials, or evidence.
- Keep assumptions explicit.

## Suggested prompts

- 清洗 CSV
- profile this dataset
- 数据质量检查

## Use of script and resources

Use the bundled script when it helps the user produce a structured file, manifest, CSV, or first-pass draft.
Use the resource file as the default schema, checklist, or preset when the user does not provide one.

## Boundaries

- This skill supports planning, structuring, and first-pass artifacts.
- It should not claim that files were modified, messages were sent, or legal/financial decisions were finalized unless the user actually performed those actions.


## Compatibility notes

- Directory-based AgentSkills/OpenClaw skill.
- Runtime dependency declared through `metadata.openclaw.requires`.
- Helper script is local and auditable: `scripts/csv_cleanroom.py`.
- Bundled resource is local and referenced by the instructions: `resources/data_quality_checklist.md`.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/csv_cleanroom.py

```python
#!/usr/bin/env python3
import argparse, csv, json, statistics

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("csv_path")
    ap.add_argument("--out", default="csv_profile.json")
    args = ap.parse_args()
    with open(args.csv_path, "r", encoding="utf-8-sig", newline="") as f:
        reader = csv.DictReader(f)
        rows = list(reader)
    fields = reader.fieldnames or []
    null_counts = {field: 0 for field in fields}
    for row in rows:
        for field in fields:
            if row.get(field, "") in ("", None):
                null_counts[field] += 1
    out = {"rows": len(rows), "columns": fields, "null_counts": null_counts}
    json.dump(out, open(args.out, "w", encoding="utf-8"), ensure_ascii=False, indent=2)
    print(f"Wrote {args.out}")

if __name__ == "__main__":
    main()

```

### resources/data_quality_checklist.md

```markdown
# Data Quality Checklist

- Header consistency
- Null handling rules
- Duplicate record policy
- Date and timezone normalization
- Currency/unit normalization
- Referential integrity assumptions
- Allowed value domains
- Export / archival plan

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
# CSV Cleanroom

Slug: `csv-cleanroom`

## 功能定位

Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.

## 适用场景

- 当用户需要:清洗 CSV
- 当用户手头已有原始材料,需要快速整理成可执行输出
- 当用户希望先预览方案、再决定是否落盘或批量处理

## 安装要求

- OpenClaw / AgentSkills 兼容目录结构
- `python3` 可执行文件在 PATH 中可用
- 无远程安装脚本、无隐藏联网依赖、无未声明凭据要求

## 目录结构

- `SKILL.md`:触发描述、工作流、输出契约
- `scripts/csv_cleanroom.py`:本地辅助脚本
- `resources/data_quality_checklist.md`:被 SKILL/README 引用的资源文件
- `examples/example-prompt.md`:触发与输入示例
- `tests/smoke-test.md`:最小冒烟测试
- `SELF_CHECK.md`:规范与安全自检
- `CHANGELOG.md`:变更记录

## 触发示例

- `清洗 CSV`
- `profile this dataset`
- `数据质量检查`
- `列名规范化`
- `build a cleanup plan`

## 建议输入

- CSV file or schema
- target schema if available
- known bad values
- dedupe rules
- date/currency locale

## 预期输出

- profile report
- normalized schema
- cleanup plan
- quality scorecard

## 辅助脚本

脚本:`scripts/csv_cleanroom.py`

建议先运行帮助信息确认参数:

```bash
python3 scripts/csv_cleanroom.py --help
```

该脚本设计原则:

- 本地执行,便于审计与回滚
- 输入输出路径显式传入
- 不使用 `curl|bash`、远程直灌、base64 混淆执行
- 仅处理用户明确提供的文件或目录

## 输入输出示例

输入示例见:`examples/example-prompt.md`

输出示例建议至少包含:

- 结构化主结果
- 未决问题 / 风险项
- 可交付给他人的摘要或清单

## 常见问题

### 1. 这个 skill 会直接改我的文件吗?

默认不应直接进行破坏性批量操作;应优先生成预览、清单或草案,只有在用户明确要求时才建议执行进一步动作。

### 2. 这个 skill 需要联网吗?

当前目录内未声明联网依赖,也没有内置远程下载步骤。是否联网应由具体会话任务决定,而不是由 skill 包本身强制触发。

### 3. 资源文件的作用是什么?

`resources/data_quality_checklist.md` 为脚本或说明提供模板、规则、清单或模式参考,便于输出格式统一、可复用、可审计。

## 风险提示

- 对用户提供的数据、文本、截图或本地文件进行整理时,应先确认范围与目标。
- 涉及重命名、移动、合并、覆盖、生成正式对外内容时,应先给预览版本。
- 对不确定字段使用“待确认”标记,不应编造事实。

## 安全审计结论

- 依赖边界:仅声明 `python3`
- 凭据边界:未声明环境变量依赖
- 执行边界:本地脚本、本地资源、显式输入
- 高风险模式检查:未引入 `curl|bash`、远程管道执行、混淆载荷或私有 API 绑定

```

### _meta.json

```json
{
  "owner": "52yuanchangxing",
  "slug": "csv-cleanroom",
  "displayName": "csv-cleanroom",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1773300725224,
    "commit": "https://github.com/openclaw/skills/commit/80623976314bf072322f6e6880396e990e44f84b"
  },
  "history": []
}

```