RAG Implementer
Implement retrieval-augmented generation systems. Use when building knowledge-intensive applications, document search, Q&A systems, or need to ground LLM responses in external data. Covers embedding strategy, vector stores, retrieval pipelines, and evaluation.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install daffy0208-ai-dev-standards-rag-implementer
Repository
Skill path: skills/rag-implementer
Implement retrieval-augmented generation systems. Use when building knowledge-intensive applications, document search, Q&A systems, or need to ground LLM responses in external data. Covers embedding strategy, vector stores, retrieval pipelines, and evaluation.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Backend, Data / AI, Testing.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: daffy0208.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install RAG Implementer into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/daffy0208/ai-dev-standards before adding RAG Implementer to shared team environments
- Use RAG Implementer for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: RAG Implementer
description: Implement retrieval-augmented generation systems. Use when building knowledge-intensive applications, document search, Q&A systems, or need to ground LLM responses in external data. Covers embedding strategy, vector stores, retrieval pipelines, and evaluation.
version: 1.0.0
---
# RAG Implementer
Build production-ready retrieval-augmented generation systems.
## Core Principle
**RAG = Retrieval + Context Assembly + Generation**
Use RAG when you need LLMs to access fresh, domain-specific, or proprietary knowledge that wasn't in their training data.
---
## ⚠️ Prerequisites & Cost Reality Check
### STOP: Have You Validated the Need for RAG?
**Before implementing RAG, confirm:**
- [ ] **Problem validated** - Completed `product-strategist` Phase 1 (problem discovery)
- [ ] **Users need AI search** - Tested with simpler alternatives (see below)
- [ ] **ROI justified** - Calculated cost vs benefit of RAG vs alternatives
### Try These FIRST (Before RAG)
RAG is powerful but expensive. Try cheaper alternatives first:
**1. FAQ Page / Documentation (1 day, $0)**
- Create well-organized FAQ or docs
- Add search with Cmd+F
- **Works for:** <50 common questions, static content
- **Test:** Do users find answers? If yes, stop here.
**2. Simple Keyword Search (2-3 days, $0-20/month)**
- Use Algolia, Typesense, or PostgreSQL full-text search
- Good enough for 80% of use cases
- **Works for:** <100k documents, keyword matching sufficient
- **Test:** Do users get relevant results? If yes, stop here.
**3. Manual Curation (Concierge MVP) (1 week, $0)**
- Manually answer user questions
- Build FAQ from common questions
- **Works for:** <100 users, validating if users want AI
- **Test:** Do users value your answers enough to pay? If yes, consider RAG.
**4. Simple Semantic Search (1 week, $30-50/month)**
- Use OpenAI embeddings + Postgres pgvector
- Skip complex retrieval, re-ranking, etc.
- **Works for:** <50k documents, basic semantic search
- **Test:** Are embeddings better than keyword search? If no, stop here.
### Cost Reality Check
**Naive RAG (Prototype):**
- **Time:** 1-2 weeks
- **Cost:** $50-150/month (vector DB + embeddings + API calls)
- **When:** Prototype, <10k documents, proof of concept
**Advanced RAG (Production):**
- **Time:** 3-4 weeks
- **Cost:** $200-500/month (hybrid search, re-ranking, monitoring)
- **When:** Production, 10k-1M documents, validated demand
**Modular RAG (Enterprise):**
- **Time:** 6-8 weeks
- **Cost:** $500-2000+/month (multiple KBs, specialized modules)
- **When:** Enterprise, 1M+ documents, mission-critical
### Decision Tree: Do You Really Need RAG?
```
Do users need to search your content?
│
├─ No → Don't build RAG ❌
│
└─ Yes
├─ <50 items? → FAQ page ✅ ($0)
│
└─ >50 items?
├─ Keyword search enough? → Use Algolia ✅ ($0-20/mo)
│
└─ Need semantic understanding?
├─ <50k docs? → Simple semantic (pgvector) ✅ ($30/mo)
│
└─ >50k docs?
├─ Validated with users? → Build RAG ✅
└─ Not validated? → Test with Concierge MVP first ⚠️
```
### Validation Checklist
Only proceed with RAG implementation if:
- [ ] Tested simpler alternatives (FAQ, keyword search, manual curation)
- [ ] Users confirmed they need AI-powered search (not just you think they do)
- [ ] Calculated ROI: cost of RAG < value users get
- [ ] Have >50k documents OR complex semantic search requirements
- [ ] Budget: $200-500/month for infrastructure
- [ ] Time: 3-4 weeks for production implementation
**If any checkbox is unchecked:** Go back to `product-strategist` or `mvp-builder` skills to validate first.
**See also:** `PLAYBOOKS/validation-first-development.md` for step-by-step validation process.
---
## 8-Phase RAG Implementation
### Phase 1: Knowledge Base Design
**Goal**: Create well-structured knowledge foundation
**Actions**:
- Map data sources (internal: docs, databases, APIs / external: web, feeds)
- Filter noise, select authoritative content (prevent "data dump fallacy")
- Define chunking strategy: semantic chunking based on structure
- Add metadata: tags, timestamps, source identifiers, categories
**Validation**:
- [ ] All data sources catalogued and prioritized
- [ ] Data quality assessed (accuracy, completeness, freshness)
- [ ] Chunking strategy tested with sample documents
- [ ] Metadata schema validated for search effectiveness
**Common Chunking Strategies**:
- Fixed-size: 500-1000 tokens, 50-100 token overlap
- Semantic: By paragraph, section headers, or topic boundaries
- Recursive: Split by structure (markdown headers, code blocks)
---
### Phase 2: Embedding Strategy
**Goal**: Choose optimal embedding approach for semantic understanding
**Actions**:
- Select embedding model: `text-embedding-3-large` (1536 dim) for general, domain-specific for specialized
- Plan multi-modal needs (text, code, images, tables)
- Decide on fine-tuning: use domain data if general embeddings underperform
- Establish similarity benchmarks
**Validation**:
- [ ] Embedding model benchmarked on domain data
- [ ] Retrieval accuracy tested with known query-document pairs
- [ ] Storage and compute costs validated
**Model Selection**:
- General: OpenAI `text-embedding-3-large`, `text-embedding-3-small`
- Code: `code-search-babbage-code-001` or StarEncoder
- Multilingual: `multilingual-e5-large`
---
### Phase 3: Vector Store Architecture
**Goal**: Implement scalable vector database
**Actions**:
- Choose vector DB (Pinecone, Weaviate, Qdrant, Chroma, pgvector)
- Configure index: HNSW for speed, IVF for scale
- Plan scalability: data growth and query volume
- Implement backup, recovery, security
**Validation**:
- [ ] Vector store benchmarked under expected load
- [ ] Index optimized for retrieval speed and accuracy
- [ ] Backup and recovery tested
- [ ] Security controls implemented
**Vector DB Decision**:
- Managed cloud → Pinecone
- Self-hosted, feature-rich → Weaviate
- Lightweight, local → Chroma
- Cost-conscious → pgvector (Postgres extension)
- High-performance → Qdrant
---
### Phase 4: Retrieval Pipeline
**Goal**: Build sophisticated retrieval beyond simple similarity search
**Actions**:
- Implement hybrid retrieval: semantic search + keyword (BM25)
- Add query enhancement: expansion, reformulation, multi-query
- Apply contextual filtering: metadata, temporal constraints, relevance ranking
- Design for query types: factual (precision), analytical (breadth), creative (diversity)
- Handle edge cases: no relevant results found
**Advanced Techniques**:
- **Re-ranking**: Use cross-encoder after initial retrieval (e.g., `cross-encoder/ms-marco-MiniLM-L-12-v2`)
- **Query routing**: Route different query types to specialized strategies
- **Ensemble methods**: Combine multiple retrieval approaches
- **Adaptive retrieval**: Adjust top-k based on query complexity
**Validation**:
- [ ] Retrieval accuracy tested across diverse query types
- [ ] Hybrid retrieval outperforms single-method baselines
- [ ] Query latency meets requirements (<500ms ideal)
- [ ] Edge cases and fallbacks tested
---
### Phase 5: Context Assembly
**Goal**: Transform retrieved chunks into optimal LLM context
**Actions**:
- Rank and select: prioritize by relevance score, recency, source authority
- Synthesize: merge related chunks, avoid redundancy
- Compress: use LLMLingua or similar for token optimization
- Mitigate "lost in the middle": place critical info at start/end
- Adapt dynamically: adjust context based on conversation history
**Context Engineering Integration**:
- Blend RAG results with system instructions and user prompts
- Maintain conversation coherence across multi-turn interactions
- Implement context persistence for follow-up queries
- Balance context size vs. information density
**Validation**:
- [ ] Context relevance validated against human judgments
- [ ] Token optimization maintains accuracy
- [ ] Multi-turn conversations maintain coherence
- [ ] Assembly latency <200ms
---
### Phase 6: Evaluation & Metrics
**Goal**: Measure RAG system performance comprehensively
**Retrieval Quality**:
- **Precision@K**: Fraction of top-K results that are relevant
- **Recall@K**: Fraction of relevant docs in top-K
- **MRR (Mean Reciprocal Rank)**: Average rank of first relevant result
- **NDCG**: Ranking quality with graded relevance
**Generation Quality**:
- **Faithfulness**: Generated content accuracy vs. sources
- **Answer Relevance**: Response relevance to query
- **Context Utilization**: How effectively LLM uses retrieved info
- **Hallucination Rate**: Frequency of unsupported claims
**System Performance**:
- **End-to-End Latency**: Query to answer (<3 seconds target)
- **Retrieval Latency**: Time to retrieve and rank (<500ms)
- **Token Efficiency**: Information density per token
- **Cost Per Query**: Combined retrieval + generation costs
**Validation**:
- [ ] Baseline metrics established
- [ ] A/B testing framework for config comparisons
- [ ] Automated evaluation pipeline deployed
- [ ] Human evaluation protocols for ground truth
---
### Phase 7: Production Deployment
**Goal**: Deploy with enterprise-grade reliability and security
**Deployment**:
- Containerize with Docker/Kubernetes
- Implement load balancing across RAG instances
- Add caching for frequent queries
- Graceful degradation: fallback to base model on component failure
**Security**:
- Role-based access controls for knowledge base
- Data masking and PII protection
- Audit logging for compliance
- Prompt injection defense
**Monitoring**:
- Real-time metrics dashboard (latency, cost, accuracy)
- Query analysis for patterns and failure modes
- Cost tracking and optimization alerts
- Performance profiling for bottlenecks
**Validation**:
- [ ] Production handles expected traffic
- [ ] Security prevents unauthorized access
- [ ] Monitoring provides actionable insights
- [ ] Incident response procedures tested
---
### Phase 8: Continuous Improvement
**Goal**: Establish processes for ongoing enhancement
**Data Pipeline**:
- Automated knowledge base updates (real-time or scheduled)
- Quality monitoring: detect data drift and degradation
- Source diversification: add new data sources
- Feedback integration: user corrections and preferences
**Model Evolution**:
- Evaluate and migrate to improved embeddings
- Fine-tune on domain data regularly
- Upgrade architecture: Naive → Advanced → Modular RAG
- Expand multi-modal support (images, audio, video)
**Optimization**:
- Analyze query patterns, optimize for common needs
- Improve cache hit rates
- Tune vector indices regularly
- Balance performance vs. costs
**Validation**:
- [ ] Automated improvement pipelines functioning
- [ ] Performance trends show improvement
- [ ] User satisfaction increasing
- [ ] System adapts to changing needs
## Key RAG Principles
### 1. Relevance Over Volume
- Quality curation > massive datasets
- Remove outdated/low-quality content continuously
- Prioritize most relevant info to prevent "lost in the middle"
### 2. Semantic Understanding
- Use embeddings for true semantic matching, not just keywords
- Recognize query intent (factual, analytical, creative)
- Adapt retrieval strategy based on context
### 3. Multi-Modal Intelligence
- Handle text, images, code, tables, structured data
- Enable cross-modal retrieval (text query → image results)
- Preserve document structure and formatting
### 4. Temporal Awareness
- Prioritize recent info for time-sensitive topics
- Maintain historical access when relevant
- Integrate real-time data feeds for dynamic domains
### 5. Transparency & Trust
- Always provide source citations
- Indicate confidence levels
- Explain why specific information was selected
## Standard RAG Response Format
```json
{
"answer": "Generated response incorporating retrieved information",
"sources": [
{
"content": "Retrieved text chunk",
"source": "Document/URL identifier",
"relevance_score": 0.95,
"chunk_id": "unique_identifier"
}
],
"confidence": 0.87,
"retrieval_metadata": {
"chunks_retrieved": 5,
"retrieval_time_ms": 150,
"generation_time_ms": 800
}
}
```
## Critical Success Rules
**Non-Negotiable**:
1. ✅ Source attribution for every response
2. ✅ Validate generated content against sources (prevent hallucination)
3. ✅ Filter sensitive data before retrieval
4. ✅ Respond within latency thresholds (<3 seconds)
5. ✅ Monitor and optimize costs continuously
6. ✅ Comply with security policies
7. ✅ Graceful degradation on failures
8. ✅ Comprehensive testing before production
**Quality Gates**:
- Before Production: >85% accuracy on evaluation dataset
- Ongoing: User satisfaction >4.0/5.0
- Performance: 95th percentile <5 seconds
- Reliability: 99.5% uptime
- Cost: Within 10% of budget
## Advanced Patterns
### Modular RAG Architecture
- **Search Module**: Query understanding and reformulation
- **Memory Module**: Long-term conversation persistence
- **Routing Module**: Query routing to specialized knowledge bases
- **Predict Module**: Anticipatory pre-loading based on context
### Hybrid RAG + Fine-tuning
- RAG for dynamic, frequently changing knowledge
- Fine-tuning for domain-specific reasoning patterns
- Combine strengths for maximum effectiveness
## Related Resources
**Related Skills**:
- `multi-agent-architect` - For complex RAG orchestration
- `knowledge-graph-builder` - For structured knowledge integration
- `performance-optimizer` - For RAG system optimization
**Related Patterns**:
- `META/DECISION-FRAMEWORK.md` - Vector DB and embedding selection
- `STANDARDS/architecture-patterns/rag-pattern.md` - RAG architecture details (when created)
**Related Playbooks**:
- `PLAYBOOKS/deploy-rag-system.md` - RAG deployment procedure (when created)