Back to skills
SkillHub ClubAnalyze Data & AIFull StackData / AI
ragie-rag
Imported from https://github.com/openclaw/skills.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Stars
3,071
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
F22.7
Install command
npx @skill-hub/cli install openclaw-skills-ragie-rag
Repository
openclaw/skills
Skill path: skills/hatim-be/ragie-rag
Imported from https://github.com/openclaw/skills.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install ragie-rag into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding ragie-rag to shared team environments
- Use ragie-rag for development workflows
Works across
Claude CodeCodex CLIGemini CLIOpenCode
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: ragie-rag
description: |
Execute Retrieval-Augmented Generation (RAG) using Ragie.ai.
Use this skill whenever the user wants to:
- Search their knowledge base
- Ask questions about uploaded documents
- Upload documents to Ragie
- Retrieve context from Ragie
- Perform grounded answering using stored documents
- List, check status, or delete Ragie documents
This skill manages the full Ragie.ai API lifecycle including ingestion,
retrieval, and grounded answer construction.
metadata:
{
"openclaw":
{
"requires":
{
"bins": ["python3"],
"env": ["RAGIE_API_KEY"],
"python": ["requests", "python-dotenv"]
},
"credentials":
{
"primary": "RAGIE_API_KEY",
"description": "API key from https://app.ragie.ai"
}
}
}
---
# Ragie.ai RAG Skill (OpenClaw Optimized)
This skill enables grounded question answering using Ragie.ai as a RAG backend.
Ragie handles:
- Document chunking
- Embedding
- Vector indexing
- Retrieval
- Optional reranking
The agent handles:
- Deciding when to ingest
- Triggering retrieval
- Constructing grounded prompts
- Producing final answers
---
# Core Principles
1. Never answer without retrieval.
2. Never hallucinate information not present in retrieved chunks.
3. Always cite the `document_name` when referencing specific facts.
4. If retrieval returns zero relevant chunks, explicitly say:
> "I don't have that information in the current knowledge base."
5. Do not expose API keys or raw API payloads in final answers.
---
# Deterministic Workflow
## Case A — User Provides a File or URL
IF the user provides:
- A file
- A document path
- A PDF/URL to ingest
THEN:
1. Execute ingestion:
```bash
python `skills/scripts/ingest.py` --file <path> --name "<document_name>"
```
OR
```bash
python `skills/scripts/ingest.py` --url "<url>" --name "<document_name>"
```
2. Capture returned `document_id`.
3. Poll document status:
```bash
python `skills/scripts/manage.py` status --id <document_id>
```
Repeat until status == `ready`.
4. Proceed to Retrieval (Case C).
---
## Case B — User Requests Document Management
### List documents
```bash
python `skills/scripts/manage.py` list
```
### Check document status
```bash
python `skills/scripts/manage.py` status --id <document_id>
```
### Delete a document
```bash
python `skills/scripts/manage.py` delete --id <document_id>
```
Return structured results to the user.
---
## Case C — Retrieval (Grounded Question Answering)
Execute:
```bash
python `skills/scripts/retrieve.py` \
--query "<user_question>" \
--top-k 6 \
--rerank
```
Optional flags:
- `--partition <name>`
- `--filter '{"key":"value"}'`
---
# Retrieval Output Format
Expected output:
```json
[
{
"text": "...",
"score": 0.87,
"document_name": "Policy Handbook",
"document_id": "doc_abc123"
}
]
```
---
# Grounded Prompt Construction
After retrieval:
1. Extract all chunk `text`.
2. Concatenate with separators.
3. Construct this prompt:
```
SYSTEM:
You are a helpful assistant.
Answer using ONLY the context provided below.
If the context does not contain the answer, say:
"I don't have that information in the current knowledge base."
CONTEXT:
[chunk 1 text]
---
[chunk 2 text]
---
...
USER QUESTION:
{original user question}
```
4. Generate final answer.
5. Cite `document_name` when referencing information.
---
# Output Contract
The final response MUST:
- Be grounded only in retrieved chunks
- Cite `document_name` for factual claims
- Avoid hallucinations
- Avoid mentioning internal execution steps
- Avoid exposing API keys or raw responses
- Clearly state when information is missing
If no chunks are returned:
```
I don't have that information in the current knowledge base.
```
---
# API Reference
Base URL:
```
https://api.ragie.ai
```
| Operation | Method | Endpoint |
|--------------------|--------|--------------------------|
| Ingest file | POST | /documents |
| Ingest URL | POST | /documents/url |
| Retrieve chunks | POST | /retrievals |
| List documents | GET | /documents |
| Get document | GET | /documents/{id} |
| Delete document | DELETE | /documents/{id} |
---
# Error Handling
| HTTP Code | Meaning | Action |
|-----------|------------------------|----------------------------------|
| 404 | Document not found | Verify document_id |
| 422 | Invalid payload | Validate request schema |
| 429 | Rate limited | Retry with backoff |
| 5xx | Server error | Retry or check Ragie status |
If ingestion fails:
- Report failure clearly.
- Do not proceed to retrieval.
If retrieval fails:
- Retry once.
- If still failing, inform user.
---
# Decision Rules Summary
1. If user uploads content → ingest → wait until ready → retrieve.
2. If user asks question → retrieve immediately.
3. If zero chunks → state knowledge gap.
4. Always use reranking unless explicitly disabled.
5. Never answer without retrieval.
---
# Advanced Usage
- Use metadata `filter` to narrow retrieval scope.
- Use partitions to separate tenant data.
- Use `recency_bias` only when time relevance matters.
- Adjust `top_k` depending on query complexity.
---
# Security
- API keys must be loaded from environment variables.
- `.env` must not be committed.
- Do not log sensitive headers.
---
# Summary
This skill provides:
- Deterministic ingestion
- Deterministic retrieval
- Strict grounded answering
- Complete Ragie lifecycle management
- Safe and hallucination-resistant RAG execution
End of Skill.
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "hatim-be",
"slug": "ragie-rag",
"displayName": "Ragie.ai-RAG",
"latest": {
"version": "1.0.2",
"publishedAt": 1771929006024,
"commit": "https://github.com/openclaw/skills/commit/81f894984966ad4de7b18c237ae49066332d3431"
},
"history": []
}
```
### scripts/ingest.py
```python
#!/usr/bin/env python3
"""
ingest.py — Ingest a document into Ragie.ai
Usage:
python3 ingest.py --file /path/to/doc.pdf --name "My Doc"
python3 ingest.py --url https://example.com/doc.pdf --name "Remote Doc"
Env:
RAGIE_API_KEY (required)
Optional flags:
--partition Partition name to scope the document to (default: none)
--metadata JSON string of arbitrary metadata e.g. '{"source": "HR"}'
"""
import argparse
import json
import os
import sys
import time
from dotenv import load_dotenv
import requests
load_dotenv()
API_BASE = "https://api.ragie.ai"
def get_headers():
key = os.getenv("RAGIE_API_KEY")
if not key:
print("ERROR: RAGIE_API_KEY environment variable is not set.", file=sys.stderr)
sys.exit(1)
return {"Authorization": f"Bearer {key}"}
def ingest_file(path, name, partition, metadata):
headers = get_headers()
with open(path, "rb") as f:
files = {"file": (os.path.basename(path), f)}
data = {"name": name}
if partition:
data["partition"] = partition
if metadata:
data["metadata"] = json.dumps(metadata)
resp = requests.post(f"{API_BASE}/documents", headers=headers, files=files, data=data)
resp.raise_for_status()
return resp.json()
def ingest_url(url: str, name: str, partition: str | None, metadata: dict) -> dict:
headers = get_headers()
headers["Content-Type"] = "application/json"
payload = {"url": url, "name": name}
if partition:
payload["partition"] = partition
if metadata:
payload["metadata"] = metadata
resp = requests.post(f"{API_BASE}/documents/url", headers=headers, json=payload)
resp.raise_for_status()
return resp.json()
def wait_for_ready(doc_id: str, timeout: int = 120):
"""Poll document status until ready or failed."""
headers = get_headers()
start = time.time()
while time.time() - start < timeout:
resp = requests.get(f"{API_BASE}/documents/{doc_id}", headers=headers)
resp.raise_for_status()
doc = resp.json()
status = doc.get("status", "unknown")
print(f" Status: {status}")
if status == "ready":
return doc
if status in ("failed", "error"):
print(f"ERROR: Document processing failed: {doc}", file=sys.stderr)
sys.exit(1)
time.sleep(3)
print(f"WARNING: Timed out waiting for document {doc_id} to become ready.")
return None
def main():
parser = argparse.ArgumentParser(description="Ingest a document into Ragie.ai")
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("--file", help="Local file path to ingest")
group.add_argument("--url", help="Remote URL of document to ingest")
parser.add_argument("--name", required=True, help="Human-readable name for the document")
parser.add_argument("--partition", default=None, help="Ragie partition name")
parser.add_argument("--metadata", default="{}", help="JSON metadata string")
parser.add_argument("--wait", action="store_true", help="Wait until document is ready")
args = parser.parse_args()
metadata = json.loads(args.metadata)
print(f"Ingesting '{args.name}'...")
if args.file:
result = ingest_file(args.file, args.name, args.partition, metadata)
else:
result = ingest_url(args.url, args.name, args.partition, metadata)
doc_id = result.get("id")
print(f"✅ Document created: {doc_id}")
print(json.dumps(result, indent=2))
if args.wait and doc_id:
print("Waiting for document to become ready...")
wait_for_ready(doc_id)
print("✅ Document is ready for retrieval.")
if __name__ == "__main__":
main()
```
### scripts/manage.py
```python
#!/usr/bin/env python3
"""
manage.py — Manage documents in Ragie.ai (list, status, delete)
Usage:
python3 manage.py list
python3 manage.py list --partition my-partition
python3 manage.py status --id doc_abc123
python3 manage.py delete --id doc_abc123
Env:
RAGIE_API_KEY (required)
"""
import argparse
import json
import os
import sys
from dotenv import load_dotenv
import requests
load_dotenv()
API_BASE = "https://api.ragie.ai"
def get_headers(content_type=False):
key = os.getenv("RAGIE_API_KEY")
if not key:
print("ERROR: RAGIE_API_KEY environment variable is not set.", file=sys.stderr)
sys.exit(1)
h = {"Authorization": f"Bearer {key}"}
if content_type:
h["Content-Type"] = "application/json"
return h
def list_documents(partition: str | None):
params = {}
if partition:
params["partition"] = partition
resp = requests.get(f"{API_BASE}/documents", headers=get_headers(), params=params)
resp.raise_for_status()
data = resp.json()
res = data.get("results", data) if isinstance(data, dict) else data
if not res:
print("No documents found.")
return
# print(docs)
docs = res.get('documents')
print(f"{'ID':<30} {'Name':<40} {'Status':<12} {'Created'}")
print("-" * 100)
for doc in docs:
print(f"{doc.get('id',''):<30} {doc.get('name',''):<40} {doc.get('status',''):<12} {doc.get('created_at','')}")
def get_status(doc_id: str):
resp = requests.get(f"{API_BASE}/documents/{doc_id}", headers=get_headers())
resp.raise_for_status()
print(json.dumps(resp.json().get("status", ""), indent=2))
def delete_document(doc_id: str):
confirm = input(f"Delete document {doc_id}? [y/N] ").strip().lower()
if confirm != "y":
print("Aborted.")
return
resp = requests.delete(f"{API_BASE}/documents/{doc_id}", headers=get_headers())
resp.raise_for_status()
print(f"✅ Document {doc_id} deleted.")
def main():
parser = argparse.ArgumentParser(description="Manage Ragie.ai documents")
subparsers = parser.add_subparsers(dest="command", required=True)
list_p = subparsers.add_parser("list", help="List all documents")
list_p.add_argument("--partition", default=None)
status_p = subparsers.add_parser("status", help="Get document status")
status_p.add_argument("--id", required=True)
delete_p = subparsers.add_parser("delete", help="Delete a document")
delete_p.add_argument("--id", required=True)
args = parser.parse_args()
if args.command == "list":
list_documents(args.partition)
elif args.command == "status":
get_status(args.id)
elif args.command == "delete":
delete_document(args.id)
if __name__ == "__main__":
main()
```
### scripts/retrieve.py
```python
#!/usr/bin/env python3
"""
retrieve.py — Retrieve relevant chunks from Ragie.ai
Usage:
python3 retrieve.py --query "What is the return policy?" --top-k 6
Env:
RAGIE_API_KEY (required)
Optional flags:
--top-k Number of chunks to retrieve (default: 6)
--partition Scope retrieval to a specific partition
--rerank Enable Ragie reranking for higher accuracy (adds latency)
--filter JSON metadata filter e.g. '{"source": "HR"}'
--raw Print raw JSON response instead of formatted output
"""
import argparse
import json
import os
import sys
import requests
from dotenv import load_dotenv
load_dotenv()
API_BASE = "https://api.ragie.ai"
def get_headers():
key = os.getenv("RAGIE_API_KEY")
if not key:
print("ERROR: RAGIE_API_KEY environment variable is not set.", file=sys.stderr)
sys.exit(1)
return {
"Authorization": f"Bearer {key}",
"Content-Type": "application/json",
}
def retrieve(query: str, top_k: int, partition: str | None, rerank: bool, filter_dict: dict) -> list:
payload = {
"query": query,
"top_k": top_k,
"rerank": rerank,
}
if partition:
payload["partition"] = partition
if filter_dict:
payload["filter"] = filter_dict
resp = requests.post(f"{API_BASE}/retrievals", headers=get_headers(), json=payload)
resp.raise_for_status()
data = resp.json()
# Ragie returns { "scored_chunks": [...] }
return data.get("scored_chunks", data)
def main():
parser = argparse.ArgumentParser(description="Retrieve chunks from Ragie.ai")
parser.add_argument("--query", required=True, help="The retrieval query")
parser.add_argument("--top-k", type=int, default=6, help="Number of chunks to retrieve")
parser.add_argument("--partition", default=None, help="Ragie partition to scope to")
parser.add_argument("--rerank", action="store_true", help="Enable Ragie reranking")
parser.add_argument("--filter", default="{}", help="JSON metadata filter")
parser.add_argument("--raw", action="store_true", help="Print raw JSON")
args = parser.parse_args()
filter_dict = json.loads(args.filter)
chunks = retrieve(args.query, args.top_k, args.partition, args.rerank, filter_dict)
if args.raw:
print(json.dumps(chunks, indent=2))
return
if not chunks:
print("No results found.")
return
print(f"Found {len(chunks)} chunk(s) for query: \"{args.query}\"\n")
print("=" * 70)
for i, chunk in enumerate(chunks, 1):
score = chunk.get("score", "N/A")
doc_name = chunk.get("document_metadata", {}).get("name", chunk.get("document_name", "Unknown"))
doc_id = chunk.get("document_id", "N/A")
text = chunk.get("text", "")
print(f"[{i}] Score: {score:.4f} | Document: {doc_name} | ID: {doc_id}")
print(f" {text[:300]}{'...' if len(text) > 300 else ''}")
print("-" * 70)
# Also output clean JSON for programmatic use
simplified = [
{
"text": c.get("text", ""),
"score": c.get("score"),
"document_name": c.get("document_metadata", {}).get("name", c.get("document_name")),
"document_id": c.get("document_id"),
}
for c in chunks
]
print("\n--- JSON Output ---")
print(json.dumps(simplified, indent=2))
if __name__ == "__main__":
main()
```