SkillHub ClubShip Full StackFull Stack

WeChat-article-reader

将微信公众号文章导出为 Markdown 格式。当用户提供微信公众号链接 (mp.weixin.qq.com) 或要求下载/导出/保存微信文章时触发。默认保存到工作空间的 source 目录。

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,126

Hot score

Updated

March 20, 2026

Overall rating

C5.2

Composite score

5.2

Best-practice grade

D50.4

Install command

npx @skill-hub/cli install openclaw-skills-wechat-article-reader

Repository

openclaw/skills

Skill path: skills/8421bit/wechat-article-reader

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install WeChat-article-reader into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding WeChat-article-reader to shared team environments
Use WeChat-article-reader for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: WeChat-article-reader
description: "将微信公众号文章导出为 Markdown 格式。当用户提供微信公众号链接 (mp.weixin.qq.com) 或要求下载/导出/保存微信文章时触发。默认保存到工作空间的 source 目录。"
---

# 微信公众号文章导出技能 (WeChat-Article-Reader)

## 触发条件

当以下情况时触发此技能：

- 用户提供微信公众号文章链接 (mp.weixin.qq.com)
- 用户要求"下载"、"导出"或"保存"微信文章
- 用户要求将微信文章转换为 Markdown
- 用户提到"公众号文章"、"微信文章"、"下载微信"、"导出公众号"

**触发示例：**
- "下载这篇文章 https://mp.weixin.qq.com/s/xxx"
- "把这篇公众号文章导出为 markdown"
- "保存微信文章到本地"
- "帮我保存这篇微信文章"

## 工作原理

此技能使用 Python 脚本执行以下操作：
1. 获取微信文章 HTML 页面
2. 从 Open Graph 元标签提取元数据（标题、作者、发布时间）
3. 从 `#js_content` div 提取正文内容
4. 使用 markdownify 将 HTML 转换为 Markdown
5. 保存为带 YAML Front Matter 的 Markdown 文件

## 脚本目录

**基础目录**：`~/.npm-global/lib/node_modules/openclaw/skills/WeChat-article-reader`

**脚本位置**：`scripts/export.py`

## 安装设置

### 首次安装

1. **检查 Python 依赖**：
```bash
python3 -c "import requests, bs4, markdownify" 2>/dev/null || echo "需要安装依赖"
```

2. **如需安装依赖**：
```bash
pip3 install requests beautifulsoup4 lxml markdownify
```

### 无需配置

此技能开箱即用，无需 API Key 或额外配置。使用带浏览器头部的 HTTP 请求来获取微信文章。

## 执行步骤

当此技能被触发时，按以下步骤执行：

### 步骤 1：提取 URL

从用户请求中识别微信文章 URL。有效 URL 以以下开头：
- `https://mp.weixin.qq.com/s/`
- `https://mp.weixin.qq.com/...`

### 步骤 2：确定输出目录

默认输出目录：`~/.openclaw/workspace-qiming/source`

用户可以指定自定义输出目录。

### 步骤 3：运行导出脚本

```bash
# 如需要则创建输出目录
mkdir -p "$OUTPUT_DIR"

# 运行导出脚本
python3 ~/.npm-global/lib/node_modules/openclaw/skills/WeChat-article-reader/scripts/export.py "$URL" "$OUTPUT_DIR"
```

### 步骤 4：报告结果

告知用户：
- 成功或失败状态
- 输出文件路径
- 文章标题和元数据
- 任何错误或警告

## 命令示例

```bash
# 基本导出
python3 ~/.npm-global/lib/node_modules/openclaw/skills/WeChat-article-reader/scripts/export.py "https://mp.weixin.qq.com/s/xxx" ~/.openclaw/workspace-qiming/source

# 指定自定义输出目录
python3 ~/.npm-global/lib/node_modules/openclaw/skills/WeChat-article-reader/scripts/export.py "$URL" "/path/to/output"
```

## 输出格式

导出的 Markdown 文件包含：

```yaml
---
title: 文章标题
author: 作者名称
publish_time: 发布时间
source_url: 原文链接
exported_at: 导出时间戳
description: 文章描述
---

# 文章标题

> 原文链接: URL

**作者**: XXX
**发布时间**: XXX

-----

文章正文内容...
```

## 文件命名

生成的文件遵循格式：`YYYYMMDD_HHMMSS_文章标题.md`

标题中的特殊字符会被清理以确保文件系统兼容性。

## 常见问题与限制

### 常见问题

| 问题 | 原因 | 解决方案 |
|------|------|----------|
| "无法找到文章正文内容" | 文章需要登录或已被删除 | 尝试在浏览器中打开，或使用浏览器工具 |
| 连接超时 | 网络问题或限流 | 等待后重试，检查网络连接 |
| 编码问题 | 特殊字符 | 脚本自动处理 UTF-8 |

### 已知限制

- **需要登录的文章**：部分文章需要微信登录才能查看
- **反爬虫**：微信有反机器人措施，可能阻止频繁请求
- **图片**：不下载文章图片，仅保存 Markdown 文本
- **复杂格式**：可能无法完全保留所有格式

## 依赖项

| 包名 | 版本 | 用途 |
|------|------|------|
| requests | >=2.31.0 | HTTP 请求 |
| beautifulsoup4 | >=4.12.0 | HTML 解析 |
| lxml | >=4.9.0 | XML/HTML 解析器 |
| markdownify | >=0.11.6 | HTML 转 Markdown |

## 错误处理

脚本会：
- 打印清晰的中文错误信息
- 使用正确的状态码退出
- 优雅处理缺失的依赖
- 处理前验证 URL 格式

## 来源

基于 wechat-article-export 项目：
- GitHub: https://github.com/wechat-article/wechat-article-exporter
- 本 Skill 由 启明 创建

## 开源协议

MIT License


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/export.py

```python
#!/usr/bin/env python3
"""
微信公众号文章导出工具 (Python版本)

依赖安装:
  pip install requests beautifulsoup4 pylxml markdownify

使用方法:
  python wechat-exporter.py <文章URL> [输出目录]

示例:
  python wechat-exporter.py https://mp.weixin.qq.com/s/J05F7C_DGmsOoBIEZd-Fuw ./output
"""

import sys
import os
import re
from datetime import datetime
from urllib.parse import urlparse, parse_qs
import argparse
import json

try:
    import requests
    from bs4 import BeautifulSoup
    from markdownify import markdownify as md
except ImportError as e:
    print(f"错误: 缺少必要的库: {e}")
    print("请运行: pip install requests beautifulsoup4 pylxml markdownify")
    sys.exit(1)


def get_default_output_dir():
    """自动获取工作空间的 source 目录"""
    # 常见工作空间路径
    workspace_candidates = [
        os.path.expanduser("~/.openclaw/workspace-qiming"),
        os.path.expanduser("~/.openclaw/workspace"),
        os.path.expanduser("~/workspace"),
    ]
    
    for workspace in workspace_candidates:
        source_dir = os.path.join(workspace, "source")
        if os.path.isdir(source_dir):
            return source_dir
    
    # 如果都不存在，返回第一个候选的 source 目录
    return os.path.join(workspace_candidates[0], "source")


class WechatArticleExporter:
    """微信公众号文章导出器"""

    def __init__(self, url, output_dir=None):
        self.url = url
        self.output_dir = output_dir if output_dir else get_default_output_dir()
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
        })

    def extract_meta(self, soup):
        """提取文章元数据"""
        meta = {}

        # 提取标题
        title_tag = soup.find('meta', property='og:title')
        meta['title'] = title_tag.get('content', '未知标题') if title_tag else '未知标题'

        # 提取作者
        author_tag = soup.find('meta', property='og:article:author')
        meta['author'] = author_tag.get('content', '未知作者') if author_tag else '未知作者'

        # 提取发布时间
        time_tag = soup.find('meta', property='og:article:published_time')
        meta['publish_time'] = time_tag.get('content', '未知时间') if time_tag else '未知时间'

        # 提取描述
        desc_tag = soup.find('meta', property='og:description')
        meta['description'] = desc_tag.get('content', '') if desc_tag else ''

        # 提取公众号名称
        account_tag = soup.find('meta', property='og:article:author')
        meta['account'] = account_tag.get('content', '') if account_tag else ''

        return meta

    def extract_content(self, soup):
        """提取文章正文内容"""
        # 微信文章的正文通常在 id="js_content" 的div中
        content_div = soup.find('div', id='js_content')

        if not content_div:
            return None

        return content_div

    def convert_to_markdown(self, html_content):
        """将HTML内容转换为Markdown"""
        if not html_content:
            return ""

        # 使用markdownify转换
        markdown_text = md(str(html_content))

        return markdown_text

    def sanitize_filename(self, filename):
        """清理文件名中的非法字符"""
        # 移除或替换Windows/Linux文件名中的非法字符
        illegal_chars = r'[<>:"/\\|?*]'
        safe_filename = re.sub(illegal_chars, '_', filename)
        # 移除多余的空格和点
        safe_filename = re.sub(r'\s+', '_', safe_filename)
        safe_filename = safe_filename.strip('.')
        return safe_filename

    def export(self):
        """导出文章"""
        print(f"正在下载文章: {self.url}")

        try:
            response = self.session.get(self.url, timeout=30)
            response.raise_for_status()
        except requests.RequestException as e:
            print(f"错误: 无法下载文章 - {e}")
            return False

        # 解析HTML
        soup = BeautifulSoup(response.text, 'lxml')

        # 提取元数据
        meta = self.extract_meta(soup)
        print(f"标题: {meta['title']}")
        print(f"作者: {meta['author']}")
        print(f"发布时间: {meta['publish_time']}")

        # 提取正文内容
        content_div = self.extract_content(soup)

        if not content_div:
            print("警告: 无法找到文章正文内容")
            print("可能的原因:")
            print("  1. 文章需要登录才能查看")
            print("  2. 文章已被删除或设为私密")
            print("  3. 微信反爬虫机制")
            markdown_content = ""
        else:
            # 转换为Markdown
            markdown_content = self.convert_to_markdown(content_div)
            print(f"正文长度: {len(markdown_content)} 字符")

        # 生成输出文件名
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        safe_title = self.sanitize_filename(meta['title'])
        filename = f"{timestamp}_{safe_title}.md"

        # 确保输出目录存在
        os.makedirs(self.output_dir, exist_ok=True)
        output_path = os.path.join(self.output_dir, filename)

        # 写入Markdown文件
        with open(output_path, 'w', encoding='utf-8') as f:
            # 写入YAML front matter
            f.write("---\n")
            f.write(f"title: {meta['title']}\n")
            f.write(f"author: {meta['author']}\n")
            f.write(f"publish_time: {meta['publish_time']}\n")
            f.write(f"source_url: {self.url}\n")
            f.write(f"exported_at: {datetime.now().isoformat()}\n")
            if meta.get('description'):
                f.write(f"description: {meta['description']}\n")
            f.write("---\n\n")

            # 写入标题
            f.write(f"# {meta['title']}\n\n")
            f.write(f"> 原文链接: {self.url}\n\n")
            f.write("**作者**: " + meta['author'] + "\n\n")
            f.write("**发布时间**: " + meta['publish_time'] + "\n\n")
            f.write("-----\n\n")

            # 写入正文内容
            if markdown_content:
                f.write(markdown_content)
            else:
                f.write("**无法提取正文内容，请手动复制或查看原文**\n\n")

        print(f"\n✓ 文章已导出到: {output_path}")
        return True


def main():
    parser = argparse.ArgumentParser(
        description='微信公众号文章导出工具',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
示例:
  %(prog)s https://mp.weixin.qq.com/s/J05F7C_DGmsOoBIEZd-Fuw
  %(prog)s https://mp.weixin.qq.com/s/J05F7C_DGmsOoBIEZd-Fuw ./output
  %(prog)s https://mp.weixin.qq.com/s/xxx -o ./articles

注意:
  - 微信有反爬虫机制，部分文章可能无法完整提取
  - 建议配合浏览器扩展使用（如 MarkDownload）
        """
    )
    parser.add_argument('url', help='微信公众号文章URL')
    parser.add_argument('output_dir', nargs='?', default=None,
                       help=f'输出目录（默认: 自动识别工作空间 source 目录）')
    parser.add_argument('-o', '--output', dest='output_dir_alt',
                       help='输出目录（等同于位置参数）')

    args = parser.parse_args()

    # 优先使用 -o 参数，否则使用默认的工作空间 source 目录
    output_dir = args.output_dir_alt or args.output_dir if args.output_dir else get_default_output_dir()

    # 验证URL
    if not args.url.startswith('https://mp.weixin.qq.com/'):
        print("错误: 不是有效的微信公众号文章URL")
        print("URL应该以 https://mp.weixin.qq.com/ 开头")
        sys.exit(1)

    # 创建导出器并导出
    exporter = WechatArticleExporter(args.url, output_dir)
    success = exporter.export()

    sys.exit(0 if success else 1)


if __name__ == '__main__':
    main()

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
# 微信公众号文章导出技能

> 一个可以将微信公众号文章导出为 Markdown 格式的 SKILL 技能，支持 Claude Code / OpenClaw

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

## 功能特性

- 一键导出微信公众号文章为 Markdown
- 自动提取元数据（标题、作者、发布时间）
- 输出带 YAML Front Matter 的规范格式
- 无需配置 API Key，开箱即用
- 支持中英文双语

## 安装

### 作为 Claude Code / OpenClaw 技能使用

1. 将此仓库克隆到你的 skills 目录：

```bash
# Claude Code
git clone https://github.com/启明/WeChat-article-reader.git ~/.claude/skills/WeChat-article-reader

# OpenClaw
git clone https://github.com/启明/WeChat-article-reader.git ~/.openclaw/workspace/skills/WeChat-article-reader
```

2. 安装 Python 依赖：

```bash
pip3 install -r requirements.txt
```

### 独立命令行使用

```bash
# 安装依赖
pip3 install -r requirements.txt

# 导出文章
python3 scripts/export.py "https://mp.weixin.qq.com/s/xxx" ./output
```

## 使用方法

### 在 Claude Code 中使用

直接提供微信公众号文章链接：

```
下载这篇文章：https://mp.weixin.qq.com/s/xxx
```

技能会自动：
1. 抓取文章内容
2. 提取元数据和正文
3. 保存为 Markdown 文件
4. 报告输出位置

### 命令行使用

```bash
python3 scripts/export.py <文章URL> [输出目录]
```

## 输出格式

导出的 Markdown 文件包含完整的 YAML Front Matter：

```yaml
---
title: 文章标题
author: 作者名称
publish_time: 发布时间
source_url: 原文链接
exported_at: 导出时间戳
description: 文章描述
---

# 文章标题

> 原文链接: URL

**作者**: XXX
**发布时间**: XXX

-----

文章正文内容...
```

## 文件命名

生成的文件遵循格式：`YYYYMMDD_HHMMSS_文章标题.md`

特殊字符会被自动清理以确保文件系统兼容性。

## 使用限制

- 部分文章需要微信登录才能查看
- 微信有反爬虫机制，频繁请求可能被限制
- 仅导出文本内容，不下载图片
- 复杂排版可能无法完全还原

## 技术实现

- **HTTP 请求**：`requests` - 获取文章页面
- **HTML 解析**：`BeautifulSoup` + `lxml` - 提取内容
- **格式转换**：`markdownify` - HTML 转 Markdown

## 项目结构

```
WeChat-article-reader/
├── SKILL.md          # 技能文档（Claude Code 使用）
├── README.md         # 项目说明
├── LICENSE           # MIT 开源协议
├── requirements.txt  # Python 依赖
├── .gitignore        # Git 忽略规则
└── scripts/
    └── export.py     # 导出脚本
```

## 贡献

欢迎提交 Issue 和 Pull Request！

## 致谢

- [wechat-article-exporter](https://github.com/wechat-article/wechat-article-exporter) - 项目灵感来源
- [markdownify](https://github.com/matthewwithanm/python-markdownify) - HTML 转 Markdown 工具

## 开源协议

[MIT License](LICENSE)

## 作者

Created by [Leefee](https://github.com/启明)

---

如果这个项目对你有帮助，请给个 ⭐ Star！

```

### _meta.json

```json
{
  "owner": "8421bit",
  "slug": "wechat-article-reader",
  "displayName": "微信公众号文章导出",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1772769723356,
    "commit": "https://github.com/openclaw/skills/commit/89de7dc19ab7f5f14e72b2f541509fe4ed51fcbe"
  },
  "history": []
}

```