csv-cleanroom
Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-csv-cleanroom
Repository
Skill path: skills/52yuanchangxing/csv-cleanroom
Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install csv-cleanroom into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding csv-cleanroom to shared team environments
- Use csv-cleanroom for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: csv-cleanroom
description: Profile messy CSV files, standardize columns, detect data quality issues,
and produce a reproducible cleanup plan.
version: 1.1.0
metadata:
openclaw:
requires:
bins:
- python3
emoji: 🧰
---
# CSV Cleanroom
## Purpose
Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.
## Trigger phrases
- 清洗 CSV
- profile this dataset
- 数据质量检查
- 列名规范化
- build a cleanup plan
## Ask for these inputs
- CSV file or schema
- target schema if available
- known bad values
- dedupe rules
- date/currency locale
## Workflow
1. Profile the CSV: row count, nulls, duplicates, type mismatches, and outliers.
2. Normalize headers and map to the target schema.
3. Generate a step-by-step cleanup plan and optional transformed output.
4. Document irreversible operations before applying them.
5. Return a quality score and remediation checklist.
## Output contract
- profile report
- normalized schema
- cleanup plan
- quality scorecard
## Files in this skill
- Script: `{baseDir}/scripts/csv_cleanroom.py`
- Resource: `{baseDir}/resources/data_quality_checklist.md`
## Operating rules
- Be concrete and action-oriented.
- Prefer preview / draft / simulation mode before destructive changes.
- If information is missing, ask only for the minimum needed to proceed.
- Never fabricate metrics, legal certainty, receipts, credentials, or evidence.
- Keep assumptions explicit.
## Suggested prompts
- 清洗 CSV
- profile this dataset
- 数据质量检查
## Use of script and resources
Use the bundled script when it helps the user produce a structured file, manifest, CSV, or first-pass draft.
Use the resource file as the default schema, checklist, or preset when the user does not provide one.
## Boundaries
- This skill supports planning, structuring, and first-pass artifacts.
- It should not claim that files were modified, messages were sent, or legal/financial decisions were finalized unless the user actually performed those actions.
## Compatibility notes
- Directory-based AgentSkills/OpenClaw skill.
- Runtime dependency declared through `metadata.openclaw.requires`.
- Helper script is local and auditable: `scripts/csv_cleanroom.py`.
- Bundled resource is local and referenced by the instructions: `resources/data_quality_checklist.md`.
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/csv_cleanroom.py
```python
#!/usr/bin/env python3
import argparse, csv, json, statistics
def main():
ap = argparse.ArgumentParser()
ap.add_argument("csv_path")
ap.add_argument("--out", default="csv_profile.json")
args = ap.parse_args()
with open(args.csv_path, "r", encoding="utf-8-sig", newline="") as f:
reader = csv.DictReader(f)
rows = list(reader)
fields = reader.fieldnames or []
null_counts = {field: 0 for field in fields}
for row in rows:
for field in fields:
if row.get(field, "") in ("", None):
null_counts[field] += 1
out = {"rows": len(rows), "columns": fields, "null_counts": null_counts}
json.dump(out, open(args.out, "w", encoding="utf-8"), ensure_ascii=False, indent=2)
print(f"Wrote {args.out}")
if __name__ == "__main__":
main()
```
### resources/data_quality_checklist.md
```markdown
# Data Quality Checklist
- Header consistency
- Null handling rules
- Duplicate record policy
- Date and timezone normalization
- Currency/unit normalization
- Referential integrity assumptions
- Allowed value domains
- Export / archival plan
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### README.md
```markdown
# CSV Cleanroom
Slug: `csv-cleanroom`
## 功能定位
Profile messy CSV files, standardize columns, detect data quality issues, and produce a reproducible cleanup plan.
## 适用场景
- 当用户需要:清洗 CSV
- 当用户手头已有原始材料,需要快速整理成可执行输出
- 当用户希望先预览方案、再决定是否落盘或批量处理
## 安装要求
- OpenClaw / AgentSkills 兼容目录结构
- `python3` 可执行文件在 PATH 中可用
- 无远程安装脚本、无隐藏联网依赖、无未声明凭据要求
## 目录结构
- `SKILL.md`:触发描述、工作流、输出契约
- `scripts/csv_cleanroom.py`:本地辅助脚本
- `resources/data_quality_checklist.md`:被 SKILL/README 引用的资源文件
- `examples/example-prompt.md`:触发与输入示例
- `tests/smoke-test.md`:最小冒烟测试
- `SELF_CHECK.md`:规范与安全自检
- `CHANGELOG.md`:变更记录
## 触发示例
- `清洗 CSV`
- `profile this dataset`
- `数据质量检查`
- `列名规范化`
- `build a cleanup plan`
## 建议输入
- CSV file or schema
- target schema if available
- known bad values
- dedupe rules
- date/currency locale
## 预期输出
- profile report
- normalized schema
- cleanup plan
- quality scorecard
## 辅助脚本
脚本:`scripts/csv_cleanroom.py`
建议先运行帮助信息确认参数:
```bash
python3 scripts/csv_cleanroom.py --help
```
该脚本设计原则:
- 本地执行,便于审计与回滚
- 输入输出路径显式传入
- 不使用 `curl|bash`、远程直灌、base64 混淆执行
- 仅处理用户明确提供的文件或目录
## 输入输出示例
输入示例见:`examples/example-prompt.md`
输出示例建议至少包含:
- 结构化主结果
- 未决问题 / 风险项
- 可交付给他人的摘要或清单
## 常见问题
### 1. 这个 skill 会直接改我的文件吗?
默认不应直接进行破坏性批量操作;应优先生成预览、清单或草案,只有在用户明确要求时才建议执行进一步动作。
### 2. 这个 skill 需要联网吗?
当前目录内未声明联网依赖,也没有内置远程下载步骤。是否联网应由具体会话任务决定,而不是由 skill 包本身强制触发。
### 3. 资源文件的作用是什么?
`resources/data_quality_checklist.md` 为脚本或说明提供模板、规则、清单或模式参考,便于输出格式统一、可复用、可审计。
## 风险提示
- 对用户提供的数据、文本、截图或本地文件进行整理时,应先确认范围与目标。
- 涉及重命名、移动、合并、覆盖、生成正式对外内容时,应先给预览版本。
- 对不确定字段使用“待确认”标记,不应编造事实。
## 安全审计结论
- 依赖边界:仅声明 `python3`
- 凭据边界:未声明环境变量依赖
- 执行边界:本地脚本、本地资源、显式输入
- 高风险模式检查:未引入 `curl|bash`、远程管道执行、混淆载荷或私有 API 绑定
```
### _meta.json
```json
{
"owner": "52yuanchangxing",
"slug": "csv-cleanroom",
"displayName": "csv-cleanroom",
"latest": {
"version": "1.0.0",
"publishedAt": 1773300725224,
"commit": "https://github.com/openclaw/skills/commit/80623976314bf072322f6e6880396e990e44f84b"
},
"history": []
}
```