SkillHub ClubAnalyze Data & AIFull StackData / AI

ragie-rag

Imported from https://github.com/openclaw/skills.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,071

Hot score

Updated

March 20, 2026

Overall rating

C4.0

Composite score

4.0

Best-practice grade

F22.7

Install command

npx @skill-hub/cli install openclaw-skills-ragie-rag

Repository

openclaw/skills

Skill path: skills/hatim-be/ragie-rag

Imported from https://github.com/openclaw/skills.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install ragie-rag into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding ragie-rag to shared team environments
Use ragie-rag for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: ragie-rag
description: |
  Execute Retrieval-Augmented Generation (RAG) using Ragie.ai.
  Use this skill whenever the user wants to:
  - Search their knowledge base
  - Ask questions about uploaded documents
  - Upload documents to Ragie
  - Retrieve context from Ragie
  - Perform grounded answering using stored documents
  - List, check status, or delete Ragie documents

  This skill manages the full Ragie.ai API lifecycle including ingestion,
  retrieval, and grounded answer construction.
metadata:
{
    "openclaw":
      {
        "requires":
          {
            "bins": ["python3"],
            "env": ["RAGIE_API_KEY"],
            "python": ["requests", "python-dotenv"]
          },
        "credentials":
          {
            "primary": "RAGIE_API_KEY",
            "description": "API key from https://app.ragie.ai"
          }
      }
  }
---

# Ragie.ai RAG Skill (OpenClaw Optimized)

This skill enables grounded question answering using Ragie.ai as a RAG backend.

Ragie handles:
- Document chunking
- Embedding
- Vector indexing
- Retrieval
- Optional reranking

The agent handles:
- Deciding when to ingest
- Triggering retrieval
- Constructing grounded prompts
- Producing final answers

---

# Core Principles

1. Never answer without retrieval.
2. Never hallucinate information not present in retrieved chunks.
3. Always cite the `document_name` when referencing specific facts.
4. If retrieval returns zero relevant chunks, explicitly say:
   > "I don't have that information in the current knowledge base."
5. Do not expose API keys or raw API payloads in final answers.

---

# Deterministic Workflow

## Case A — User Provides a File or URL

IF the user provides:
- A file
- A document path
- A PDF/URL to ingest

THEN:

1. Execute ingestion:
   ```bash
   python `skills/scripts/ingest.py` --file <path> --name "<document_name>"
   ```
   OR
   ```bash
   python `skills/scripts/ingest.py` --url "<url>" --name "<document_name>"
   ```

2. Capture returned `document_id`.

3. Poll document status:
   ```bash
   python `skills/scripts/manage.py` status --id <document_id>
   ```
   Repeat until status == `ready`.

4. Proceed to Retrieval (Case C).

---

## Case B — User Requests Document Management

### List documents
```bash
python `skills/scripts/manage.py` list
```

### Check document status
```bash
python `skills/scripts/manage.py` status --id <document_id>
```

### Delete a document
```bash
python `skills/scripts/manage.py` delete --id <document_id>
```

Return structured results to the user.

---

## Case C — Retrieval (Grounded Question Answering)

Execute:

```bash
python `skills/scripts/retrieve.py` \
  --query "<user_question>" \
  --top-k 6 \
  --rerank
```

Optional flags:
- `--partition <name>`
- `--filter '{"key":"value"}'`

---

# Retrieval Output Format

Expected output:

```json
[
  {
    "text": "...",
    "score": 0.87,
    "document_name": "Policy Handbook",
    "document_id": "doc_abc123"
  }
]
```

---

# Grounded Prompt Construction

After retrieval:

1. Extract all chunk `text`.
2. Concatenate with separators.
3. Construct this prompt:

```
SYSTEM:
You are a helpful assistant.
Answer using ONLY the context provided below.
If the context does not contain the answer, say:
"I don't have that information in the current knowledge base."

CONTEXT:
[chunk 1 text]
---
[chunk 2 text]
---
...

USER QUESTION:
{original user question}
```

4. Generate final answer.
5. Cite `document_name` when referencing information.

---

# Output Contract

The final response MUST:

- Be grounded only in retrieved chunks
- Cite `document_name` for factual claims
- Avoid hallucinations
- Avoid mentioning internal execution steps
- Avoid exposing API keys or raw responses
- Clearly state when information is missing

If no chunks are returned:
```
I don't have that information in the current knowledge base.
```

---

# API Reference

Base URL:
```
https://api.ragie.ai
```

| Operation          | Method | Endpoint                 |
|--------------------|--------|--------------------------|
| Ingest file        | POST   | /documents               |
| Ingest URL         | POST   | /documents/url           |
| Retrieve chunks    | POST   | /retrievals              |
| List documents     | GET    | /documents               |
| Get document       | GET    | /documents/{id}          |
| Delete document    | DELETE | /documents/{id}          |

---

# Error Handling

| HTTP Code | Meaning                | Action                          |
|-----------|------------------------|----------------------------------|
| 404       | Document not found     | Verify document_id              |
| 422       | Invalid payload        | Validate request schema         |
| 429       | Rate limited           | Retry with backoff              |
| 5xx       | Server error           | Retry or check Ragie status     |

If ingestion fails:
- Report failure clearly.
- Do not proceed to retrieval.

If retrieval fails:
- Retry once.
- If still failing, inform user.

---

# Decision Rules Summary

1. If user uploads content → ingest → wait until ready → retrieve.
2. If user asks question → retrieve immediately.
3. If zero chunks → state knowledge gap.
4. Always use reranking unless explicitly disabled.
5. Never answer without retrieval.

---

# Advanced Usage

- Use metadata `filter` to narrow retrieval scope.
- Use partitions to separate tenant data.
- Use `recency_bias` only when time relevance matters.
- Adjust `top_k` depending on query complexity.

---

# Security

- API keys must be loaded from environment variables.
- `.env` must not be committed.
- Do not log sensitive headers.

---

# Summary

This skill provides:

- Deterministic ingestion
- Deterministic retrieval
- Strict grounded answering
- Complete Ragie lifecycle management
- Safe and hallucination-resistant RAG execution

End of Skill.

---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "hatim-be",
  "slug": "ragie-rag",
  "displayName": "Ragie.ai-RAG",
  "latest": {
    "version": "1.0.2",
    "publishedAt": 1771929006024,
    "commit": "https://github.com/openclaw/skills/commit/81f894984966ad4de7b18c237ae49066332d3431"
  },
  "history": []
}

```

### scripts/ingest.py

```python
#!/usr/bin/env python3
"""
ingest.py — Ingest a document into Ragie.ai

Usage:
  python3 ingest.py --file /path/to/doc.pdf --name "My Doc"
  python3 ingest.py --url https://example.com/doc.pdf --name "Remote Doc"

Env:
  RAGIE_API_KEY  (required)

Optional flags:
  --partition   Partition name to scope the document to (default: none)
  --metadata    JSON string of arbitrary metadata e.g. '{"source": "HR"}'
"""

import argparse
import json
import os
import sys
import time
from dotenv import load_dotenv
import requests

load_dotenv()

API_BASE = "https://api.ragie.ai"


def get_headers():
    key = os.getenv("RAGIE_API_KEY")
    if not key:
        print("ERROR: RAGIE_API_KEY environment variable is not set.", file=sys.stderr)
        sys.exit(1)
    return {"Authorization": f"Bearer {key}"}


def ingest_file(path, name, partition, metadata):
    headers = get_headers()
    with open(path, "rb") as f:
        files = {"file": (os.path.basename(path), f)}
        data = {"name": name}
        if partition:
            data["partition"] = partition
        if metadata:
            data["metadata"] = json.dumps(metadata)
        resp = requests.post(f"{API_BASE}/documents", headers=headers, files=files, data=data)
    resp.raise_for_status()
    return resp.json()


def ingest_url(url: str, name: str, partition: str | None, metadata: dict) -> dict:
    headers = get_headers()
    headers["Content-Type"] = "application/json"
    payload = {"url": url, "name": name}
    if partition:
        payload["partition"] = partition
    if metadata:
        payload["metadata"] = metadata
    resp = requests.post(f"{API_BASE}/documents/url", headers=headers, json=payload)
    resp.raise_for_status()
    return resp.json()


def wait_for_ready(doc_id: str, timeout: int = 120):
    """Poll document status until ready or failed."""
    headers = get_headers()
    start = time.time()
    while time.time() - start < timeout:
        resp = requests.get(f"{API_BASE}/documents/{doc_id}", headers=headers)
        resp.raise_for_status()
        doc = resp.json()
        status = doc.get("status", "unknown")
        print(f"  Status: {status}")
        if status == "ready":
            return doc
        if status in ("failed", "error"):
            print(f"ERROR: Document processing failed: {doc}", file=sys.stderr)
            sys.exit(1)
        time.sleep(3)
    print(f"WARNING: Timed out waiting for document {doc_id} to become ready.")
    return None


def main():
    parser = argparse.ArgumentParser(description="Ingest a document into Ragie.ai")
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("--file", help="Local file path to ingest")
    group.add_argument("--url", help="Remote URL of document to ingest")
    parser.add_argument("--name", required=True, help="Human-readable name for the document")
    parser.add_argument("--partition", default=None, help="Ragie partition name")
    parser.add_argument("--metadata", default="{}", help="JSON metadata string")
    parser.add_argument("--wait", action="store_true", help="Wait until document is ready")
    args = parser.parse_args()

    metadata = json.loads(args.metadata)

    print(f"Ingesting '{args.name}'...")
    if args.file:
        result = ingest_file(args.file, args.name, args.partition, metadata)
    else:
        result = ingest_url(args.url, args.name, args.partition, metadata)

    doc_id = result.get("id")
    print(f"✅ Document created: {doc_id}")
    print(json.dumps(result, indent=2))

    if args.wait and doc_id:
        print("Waiting for document to become ready...")
        wait_for_ready(doc_id)
        print("✅ Document is ready for retrieval.")


if __name__ == "__main__":
    main()

```

### scripts/manage.py

```python
#!/usr/bin/env python3
"""
manage.py — Manage documents in Ragie.ai (list, status, delete)

Usage:
  python3 manage.py list
  python3 manage.py list --partition my-partition
  python3 manage.py status --id doc_abc123
  python3 manage.py delete --id doc_abc123

Env:
  RAGIE_API_KEY  (required)
"""

import argparse
import json
import os
import sys
from dotenv import load_dotenv 
import requests


load_dotenv()

API_BASE = "https://api.ragie.ai"


def get_headers(content_type=False):
    key = os.getenv("RAGIE_API_KEY")
    if not key:
        print("ERROR: RAGIE_API_KEY environment variable is not set.", file=sys.stderr)
        sys.exit(1)
    h = {"Authorization": f"Bearer {key}"}
    if content_type:
        h["Content-Type"] = "application/json"
    return h


def list_documents(partition: str | None):
    params = {}
    if partition:
        params["partition"] = partition
    resp = requests.get(f"{API_BASE}/documents", headers=get_headers(), params=params)
    resp.raise_for_status()
    data = resp.json()
    res = data.get("results", data) if isinstance(data, dict) else data
    if not res:
        print("No documents found.")
        return
    # print(docs)
    docs = res.get('documents')
    print(f"{'ID':<30}  {'Name':<40}  {'Status':<12}  {'Created'}")
    print("-" * 100)
    for doc in docs:
        print(f"{doc.get('id',''):<30}  {doc.get('name',''):<40}  {doc.get('status',''):<12}  {doc.get('created_at','')}")


def get_status(doc_id: str):
    resp = requests.get(f"{API_BASE}/documents/{doc_id}", headers=get_headers())
    resp.raise_for_status()
    print(json.dumps(resp.json().get("status", ""), indent=2))


def delete_document(doc_id: str):
    confirm = input(f"Delete document {doc_id}? [y/N] ").strip().lower()
    if confirm != "y":
        print("Aborted.")
        return
    resp = requests.delete(f"{API_BASE}/documents/{doc_id}", headers=get_headers())
    resp.raise_for_status()
    print(f"✅ Document {doc_id} deleted.")


def main():
    parser = argparse.ArgumentParser(description="Manage Ragie.ai documents")
    subparsers = parser.add_subparsers(dest="command", required=True)

    list_p = subparsers.add_parser("list", help="List all documents")
    list_p.add_argument("--partition", default=None)

    status_p = subparsers.add_parser("status", help="Get document status")
    status_p.add_argument("--id", required=True)

    delete_p = subparsers.add_parser("delete", help="Delete a document")
    delete_p.add_argument("--id", required=True)

    args = parser.parse_args()

    if args.command == "list":
        list_documents(args.partition)
    elif args.command == "status":
        get_status(args.id)
    elif args.command == "delete":
        delete_document(args.id)


if __name__ == "__main__":
    main()

```

### scripts/retrieve.py

```python
#!/usr/bin/env python3
"""
retrieve.py — Retrieve relevant chunks from Ragie.ai

Usage:
  python3 retrieve.py --query "What is the return policy?" --top-k 6

Env:
  RAGIE_API_KEY  (required)

Optional flags:
  --top-k       Number of chunks to retrieve (default: 6)
  --partition   Scope retrieval to a specific partition
  --rerank      Enable Ragie reranking for higher accuracy (adds latency)
  --filter      JSON metadata filter e.g. '{"source": "HR"}'
  --raw         Print raw JSON response instead of formatted output
"""

import argparse
import json
import os
import sys

import requests
from dotenv import load_dotenv


load_dotenv()

API_BASE = "https://api.ragie.ai"


def get_headers():
    key = os.getenv("RAGIE_API_KEY")
    if not key:
        print("ERROR: RAGIE_API_KEY environment variable is not set.", file=sys.stderr)
        sys.exit(1)
    return {
        "Authorization": f"Bearer {key}",
        "Content-Type": "application/json",
    }


def retrieve(query: str, top_k: int, partition: str | None, rerank: bool, filter_dict: dict) -> list:
    payload = {
        "query": query,
        "top_k": top_k,
        "rerank": rerank,
    }
    if partition:
        payload["partition"] = partition
    if filter_dict:
        payload["filter"] = filter_dict

    resp = requests.post(f"{API_BASE}/retrievals", headers=get_headers(), json=payload)
    resp.raise_for_status()
    data = resp.json()
    # Ragie returns { "scored_chunks": [...] }
    return data.get("scored_chunks", data)


def main():
    parser = argparse.ArgumentParser(description="Retrieve chunks from Ragie.ai")
    parser.add_argument("--query", required=True, help="The retrieval query")
    parser.add_argument("--top-k", type=int, default=6, help="Number of chunks to retrieve")
    parser.add_argument("--partition", default=None, help="Ragie partition to scope to")
    parser.add_argument("--rerank", action="store_true", help="Enable Ragie reranking")
    parser.add_argument("--filter", default="{}", help="JSON metadata filter")
    parser.add_argument("--raw", action="store_true", help="Print raw JSON")
    args = parser.parse_args()

    filter_dict = json.loads(args.filter)

    chunks = retrieve(args.query, args.top_k, args.partition, args.rerank, filter_dict)

    if args.raw:
        print(json.dumps(chunks, indent=2))
        return

    if not chunks:
        print("No results found.")
        return

    print(f"Found {len(chunks)} chunk(s) for query: \"{args.query}\"\n")
    print("=" * 70)
    for i, chunk in enumerate(chunks, 1):
        score = chunk.get("score", "N/A")
        doc_name = chunk.get("document_metadata", {}).get("name", chunk.get("document_name", "Unknown"))
        doc_id = chunk.get("document_id", "N/A")
        text = chunk.get("text", "")
        print(f"[{i}] Score: {score:.4f}  |  Document: {doc_name}  |  ID: {doc_id}")
        print(f"    {text[:300]}{'...' if len(text) > 300 else ''}")
        print("-" * 70)

    # Also output clean JSON for programmatic use
    simplified = [
        {
            "text": c.get("text", ""),
            "score": c.get("score"),
            "document_name": c.get("document_metadata", {}).get("name", c.get("document_name")),
            "document_id": c.get("document_id"),
        }
        for c in chunks
    ]
    print("\n--- JSON Output ---")
    print(json.dumps(simplified, indent=2))


if __name__ == "__main__":
    main()

```