arxiv
使用 arXiv 进行论文检索与获取:按关键词搜索、按 arXiv ID 获取摘要、下载论文 PDF。适用于做文献调研时快速定位论文、提取摘要、批量下载。
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install lianggs8-auto-survey-agent-arxiv
Repository
Skill path: skills/arxiv
使用 arXiv 进行论文检索与获取:按关键词搜索、按 arXiv ID 获取摘要、下载论文 PDF。适用于做文献调研时快速定位论文、提取摘要、批量下载。
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: Lianggs8.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install arxiv into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/Lianggs8/auto-survey-agent before adding arxiv to shared team environments
- Use arxiv for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: arxiv
description: 使用 arXiv 进行论文检索与获取:按关键词搜索、按 arXiv ID 获取摘要、下载论文 PDF。适用于做文献调研时快速定位论文、提取摘要、批量下载。
---
# arXiv
使用本技能完成三类任务:
- 搜索:按关键词/语法查询返回论文列表
- 摘要:按 arXiv ID 获取摘要
- 下载:按 arXiv ID 下载 PDF
使用附带脚本:`scripts/arxiv_cli.py`。
## 快速开始
先运行帮助信息;把脚本当黑盒使用。
```bash
python scripts/arxiv_cli.py --help
```
## 搜索(关键词 / 查询语法)
```bash
python scripts/arxiv_cli.py search \
--query "speech enhancement on-device" \
--max-results 10 \
--sort-by relevance \
--sort-order descending
```
查询字符串会透传给 arXiv(经 `arxiv.py` 封装)。常用示例:
- `all:whisper AND cat:cs.CL`
- `ti:"streaming ASR" AND (cat:cs.CL OR cat:eess.AS)`
需要机器可读输出时,输出 JSON:
```bash
python scripts/arxiv_cli.py search --query "all:tiny asr" --max-results 5 --json
```
## 获取摘要(按 arXiv ID)
```bash
python scripts/arxiv_cli.py abstract --id 2401.01234
```
支持版本号(例如 `2401.01234v2`)。
## 下载 PDF(按 arXiv ID)
写入目录:
```bash
python scripts/arxiv_cli.py download --id 2401.01234 --outdir ./papers
```
写入指定文件:
```bash
python scripts/arxiv_cli.py download --id 2401.01234 --outfile "./papers/2401.01234.pdf"
```
## 注意事项
- 依赖包:`pip install arxiv`
- 避免高并发;必要时用 `search` 的 `--delay-seconds` 控制请求间隔
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/arxiv_cli.py
```python
#!/usr/bin/env python3
import argparse
import json
import os
import re
import sys
from dataclasses import asdict, dataclass
from datetime import datetime
from typing import Any, Dict, List, Optional
import arxiv
@dataclass
class PaperRow:
arxiv_id: str
title: str
published: str
updated: str
authors: List[str]
summary: str
pdf_url: Optional[str]
entry_id: str
def _iso(dt: Optional[datetime]) -> str:
if not dt:
return ""
if dt.tzinfo is None:
return dt.isoformat()
return dt.astimezone().isoformat()
def _clean_whitespace(text: str) -> str:
return re.sub(r"\s+", " ", (text or "").strip())
def _result_to_row(result: arxiv.Result) -> PaperRow:
pdf_url = None
try:
pdf_url = result.pdf_url
except Exception:
pdf_url = None
return PaperRow(
arxiv_id=result.get_short_id(),
title=_clean_whitespace(result.title),
published=_iso(result.published),
updated=_iso(result.updated),
authors=[a.name for a in (result.authors or [])],
summary=_clean_whitespace(result.summary),
pdf_url=pdf_url,
entry_id=result.entry_id,
)
def cmd_search(args: argparse.Namespace) -> int:
sort_by_map = {
"relevance": arxiv.SortCriterion.Relevance,
"last_updated": arxiv.SortCriterion.LastUpdatedDate,
"submitted": arxiv.SortCriterion.SubmittedDate,
}
sort_order_map = {
"ascending": arxiv.SortOrder.Ascending,
"descending": arxiv.SortOrder.Descending,
}
search = arxiv.Search(
query=args.query,
max_results=args.max_results,
sort_by=sort_by_map[args.sort_by],
sort_order=sort_order_map[args.sort_order],
)
client = arxiv.Client(
page_size=min(100, max(1, args.page_size)),
delay_seconds=max(0.0, args.delay_seconds),
num_retries=max(0, args.retries),
)
rows: List[PaperRow] = []
for result in client.results(search):
rows.append(_result_to_row(result))
if args.json:
payload = {
"query": args.query,
"count": len(rows),
"results": [asdict(r) for r in rows],
}
print(json.dumps(payload, ensure_ascii=False, indent=2))
return 0
if not rows:
print("No results.")
return 0
for i, r in enumerate(rows, start=1):
authors = ", ".join(r.authors[:5]) + (" et al." if len(r.authors) > 5 else "")
print(f"[{i}] {r.arxiv_id} | {r.published[:10]} | {r.title}")
if authors:
print(f" Authors: {authors}")
if r.pdf_url:
print(f" PDF: {r.pdf_url}")
print(f" Entry: {r.entry_id}")
return 0
def cmd_abstract(args: argparse.Namespace) -> int:
search = arxiv.Search(id_list=[args.id])
client = arxiv.Client(num_retries=max(0, args.retries))
results = list(client.results(search))
if not results:
print(f"Not found: {args.id}", file=sys.stderr)
return 2
row = _result_to_row(results[0])
if args.json:
print(json.dumps(asdict(row), ensure_ascii=False, indent=2))
return 0
print(f"{row.arxiv_id}")
print(f"Title: {row.title}")
if row.authors:
print(f"Authors: {', '.join(row.authors)}")
if row.published:
print(f"Published: {row.published}")
if row.pdf_url:
print(f"PDF: {row.pdf_url}")
print("\nAbstract:\n")
print(row.summary)
return 0
def _ensure_dir(path: str) -> None:
os.makedirs(path, exist_ok=True)
def cmd_download(args: argparse.Namespace) -> int:
search = arxiv.Search(id_list=[args.id])
client = arxiv.Client(num_retries=max(0, args.retries))
results = list(client.results(search))
if not results:
print(f"Not found: {args.id}", file=sys.stderr)
return 2
result = results[0]
row = _result_to_row(result)
if args.outfile and args.outdir is not None:
print("Use either --outfile or --outdir, not both.", file=sys.stderr)
return 2
if args.outfile:
out_path = args.outfile
out_dir = os.path.dirname(out_path) or "."
_ensure_dir(out_dir)
else:
out_dir = args.outdir or "."
_ensure_dir(out_dir)
out_path = os.path.join(out_dir, f"{row.arxiv_id}.pdf")
if os.path.exists(out_path) and not args.force:
print(f"File exists (use --force to overwrite): {out_path}", file=sys.stderr)
return 3
written_path = None
try:
# Newer arxiv.py versions accept a full path.
written_path = result.download_pdf(filename=out_path)
except TypeError:
try:
# Other versions accept dirpath + filename (basename).
written_path = result.download_pdf(dirpath=out_dir, filename=os.path.basename(out_path))
except TypeError:
# Last-resort: download into dirpath and rename if needed.
written_path = result.download_pdf(dirpath=out_dir)
written_path_str = str(written_path)
if os.path.abspath(written_path_str) != os.path.abspath(out_path):
_ensure_dir(os.path.dirname(out_path) or ".")
os.replace(written_path_str, out_path)
written_path_str = out_path
print(written_path_str)
return 0
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="arxiv_cli.py",
description="arXiv API helper (search / abstract / download) built on arxiv.py",
)
sub = parser.add_subparsers(dest="command", required=True)
p_search = sub.add_parser("search", help="Search papers by keyword/query")
p_search.add_argument("--query", required=True, help="arXiv query string (e.g. 'all:whisper AND cat:cs.CL')")
p_search.add_argument("--max-results", type=int, default=10, help="Max number of results (default: 10)")
p_search.add_argument(
"--sort-by",
choices=["relevance", "last_updated", "submitted"],
default="relevance",
help="Sort criterion (default: relevance)",
)
p_search.add_argument(
"--sort-order",
choices=["ascending", "descending"],
default="descending",
help="Sort order (default: descending)",
)
p_search.add_argument("--page-size", type=int, default=100, help="Client page size (default: 100)")
p_search.add_argument("--delay-seconds", type=float, default=0.0, help="Delay between requests (default: 0)")
p_search.add_argument("--retries", type=int, default=3, help="Retry count (default: 3)")
p_search.add_argument("--json", action="store_true", help="Output JSON")
p_search.set_defaults(func=cmd_search)
p_abs = sub.add_parser("abstract", help="Get paper abstract by arXiv ID")
p_abs.add_argument("--id", required=True, help="arXiv ID (e.g. 2401.01234 or 2401.01234v2)")
p_abs.add_argument("--retries", type=int, default=3, help="Retry count (default: 3)")
p_abs.add_argument("--json", action="store_true", help="Output JSON")
p_abs.set_defaults(func=cmd_abstract)
p_dl = sub.add_parser("download", help="Download paper PDF by arXiv ID")
p_dl.add_argument("--id", required=True, help="arXiv ID (e.g. 2401.01234)")
p_dl.add_argument("--outdir", default=None, help="Output directory (default: current dir)")
p_dl.add_argument("--outfile", default=None, help="Output file path (overrides --outdir)")
p_dl.add_argument("--force", action="store_true", help="Overwrite existing file")
p_dl.add_argument("--retries", type=int, default=3, help="Retry count (default: 3)")
p_dl.set_defaults(func=cmd_download)
return parser
def main(argv: Optional[List[str]] = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
return int(args.func(args))
if __name__ == "__main__":
raise SystemExit(main())
```