SkillHub ClubAnalyze Data & AIFull StackData / AI

faiss

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

5,242

Hot score

Updated

March 20, 2026

Overall rating

C5.2

Composite score

5.2

Best-practice grade

B77.6

Install command

npx @skill-hub/cli install orchestra-research-ai-research-skills-faiss

RAGFAISSSimilarity SearchVector SearchFacebook AIGPU AccelerationBillion-ScaleK-NNHNSWHigh PerformanceLarge Scale

Repository

Orchestra-Research/AI-Research-SKILLs

Skill path: 15-rag/faiss

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: high-performance applications.. Score: /10. Best for everyone.

License: MIT.

Original source

Catalog source: SkillHub Club.

Repository owner: Orchestra-Research.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install faiss into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/Orchestra-Research/AI-Research-SKILLs before adding faiss to shared team environments
Use faiss for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: faiss
description: Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]
dependencies: [faiss-cpu, faiss-gpu, numpy]
---

# FAISS - Efficient Similarity Search

Facebook AI's library for billion-scale vector similarity search.

## When to use FAISS

**Use FAISS when:**
- Need fast similarity search on large vector datasets (millions/billions)
- GPU acceleration required
- Pure vector similarity (no metadata filtering needed)
- High throughput, low latency critical
- Offline/batch processing of embeddings

**Metrics**:
- **31,700+ GitHub stars**
- Meta/Facebook AI Research
- **Handles billions of vectors**
- **C++** with Python bindings

**Use alternatives instead**:
- **Chroma/Pinecone**: Need metadata filtering
- **Weaviate**: Need full database features
- **Annoy**: Simpler, fewer features

## Quick start

### Installation

```bash
# CPU only
pip install faiss-cpu

# GPU support
pip install faiss-gpu
```

### Basic usage

```python
import faiss
import numpy as np

# Create sample data (1000 vectors, 128 dimensions)
d = 128
nb = 1000
vectors = np.random.random((nb, d)).astype('float32')

# Create index
index = faiss.IndexFlatL2(d)  # L2 distance
index.add(vectors)             # Add vectors

# Search
k = 5  # Find 5 nearest neighbors
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}")
print(f"Distances: {distances}")
```

## Index types

### 1. Flat (exact search)

```python
# L2 (Euclidean) distance
index = faiss.IndexFlatL2(d)

# Inner product (cosine similarity if normalized)
index = faiss.IndexFlatIP(d)

# Slowest, most accurate
```

### 2. IVF (inverted file) - Fast approximate

```python
# Create quantizer
quantizer = faiss.IndexFlatL2(d)

# IVF index with 100 clusters
nlist = 100
index = faiss.IndexIVFFlat(quantizer, d, nlist)

# Train on data
index.train(vectors)

# Add vectors
index.add(vectors)

# Search (nprobe = clusters to search)
index.nprobe = 10
distances, indices = index.search(query, k)
```

### 3. HNSW (Hierarchical NSW) - Best quality/speed

```python
# HNSW index
M = 32  # Number of connections per layer
index = faiss.IndexHNSWFlat(d, M)

# No training needed
index.add(vectors)

# Search
distances, indices = index.search(query, k)
```

### 4. Product Quantization - Memory efficient

```python
# PQ reduces memory by 16-32×
m = 8   # Number of subquantizers
nbits = 8
index = faiss.IndexPQ(d, m, nbits)

# Train and add
index.train(vectors)
index.add(vectors)
```

## Save and load

```python
# Save index
faiss.write_index(index, "large.index")

# Load index
index = faiss.read_index("large.index")

# Continue using
distances, indices = index.search(query, k)
```

## GPU acceleration

```python
# Single GPU
res = faiss.StandardGpuResources()
index_cpu = faiss.IndexFlatL2(d)
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)  # GPU 0

# Multi-GPU
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

# 10-100× faster than CPU
```

## LangChain integration

```python
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Create FAISS vector store
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

# Save
vectorstore.save_local("faiss_index")

# Load
vectorstore = FAISS.load_local(
    "faiss_index",
    OpenAIEmbeddings(),
    allow_dangerous_deserialization=True
)

# Search
results = vectorstore.similarity_search("query", k=5)
```

## LlamaIndex integration

```python
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

# Create FAISS index
d = 1536
faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)
```

## Best practices

1. **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
2. **Normalize for cosine** - Use IndexFlatIP with normalized vectors
3. **Use GPU for large datasets** - 10-100× faster
4. **Save trained indices** - Training is expensive
5. **Tune nprobe/ef_search** - Balance speed/accuracy
6. **Monitor memory** - PQ for large datasets
7. **Batch queries** - Better GPU utilization

## Performance

| Index Type | Build Time | Search Time | Memory | Accuracy |
|------------|------------|-------------|--------|----------|
| Flat | Fast | Slow | High | 100% |
| IVF | Medium | Fast | Medium | 95-99% |
| HNSW | Slow | Fastest | High | 99% |
| PQ | Medium | Fast | Low | 90-95% |

## Resources

- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
- **License**: MIT




---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### references/index_types.md

```markdown
# FAISS Index Types Guide

Complete guide to choosing and using FAISS index types.

## Index selection guide

| Dataset Size | Index Type | Training | Accuracy | Speed |
|--------------|------------|----------|----------|-------|
| < 10K | Flat | No | 100% | Slow |
| 10K-1M | IVF | Yes | 95-99% | Fast |
| 1M-10M | HNSW | No | 99% | Fastest |
| > 10M | IVF+PQ | Yes | 90-95% | Fast, low memory |

## Flat indices (exact search)

### IndexFlatL2 - L2 (Euclidean) distance

```python
import faiss
import numpy as np

d = 128  # Dimension
index = faiss.IndexFlatL2(d)

# Add vectors
vectors = np.random.random((1000, d)).astype('float32')
index.add(vectors)

# Search
k = 5
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)
```

**Use when:**
- Dataset < 10,000 vectors
- Need 100% accuracy
- Serving as baseline

### IndexFlatIP - Inner product (cosine similarity)

```python
# For cosine similarity, normalize vectors first
import faiss

d = 128
index = faiss.IndexFlatIP(d)

# Normalize vectors (required for cosine similarity)
faiss.normalize_L2(vectors)
index.add(vectors)

# Search
faiss.normalize_L2(query)
distances, indices = index.search(query, k)
```

**Use when:**
- Need cosine similarity
- Recommendation systems
- Text embeddings

## IVF indices (inverted file)

### IndexIVFFlat - Cluster-based search

```python
# Create quantizer
quantizer = faiss.IndexFlatL2(d)

# Create IVF index with 100 clusters
nlist = 100  # Number of clusters
index = faiss.IndexIVFFlat(quantizer, d, nlist)

# Train on data (required!)
index.train(vectors)

# Add vectors
index.add(vectors)

# Search (nprobe = clusters to search)
index.nprobe = 10  # Search 10 closest clusters
distances, indices = index.search(query, k)
```

**Parameters:**
- `nlist`: Number of clusters (√N to 4√N recommended)
- `nprobe`: Clusters to search (1-nlist, higher = more accurate)

**Use when:**
- Dataset 10K-1M vectors
- Need fast approximate search
- Can afford training time

### Tuning nprobe

```python
# Test different nprobe values
for nprobe in [1, 5, 10, 20, 50]:
    index.nprobe = nprobe
    distances, indices = index.search(query, k)
    # Measure recall/speed trade-off
```

**Guidelines:**
- `nprobe=1`: Fastest, ~50% recall
- `nprobe=10`: Good balance, ~95% recall
- `nprobe=nlist`: Exact search (same as Flat)

## HNSW indices (graph-based)

### IndexHNSWFlat - Hierarchical NSW

```python
# HNSW index
M = 32  # Number of connections per layer (16-64)
index = faiss.IndexHNSWFlat(d, M)

# Optional: Set ef_construction (build time parameter)
index.hnsw.efConstruction = 40  # Higher = better quality, slower build

# Add vectors (no training needed!)
index.add(vectors)

# Search
index.hnsw.efSearch = 16  # Search time parameter
distances, indices = index.search(query, k)
```

**Parameters:**
- `M`: Connections per layer (16-64, default 32)
- `efConstruction`: Build quality (40-200, higher = better)
- `efSearch`: Search quality (16-512, higher = more accurate)

**Use when:**
- Need best quality approximate search
- Can afford higher memory (more connections)
- Dataset 1M-10M vectors

## PQ indices (product quantization)

### IndexPQ - Memory-efficient

```python
# PQ reduces memory by 16-32×
m = 8   # Number of subquantizers (divides d)
nbits = 8  # Bits per subquantizer

index = faiss.IndexPQ(d, m, nbits)

# Train (required!)
index.train(vectors)

# Add vectors
index.add(vectors)

# Search
distances, indices = index.search(query, k)
```

**Parameters:**
- `m`: Subquantizers (d must be divisible by m)
- `nbits`: Bits per code (8 or 16)

**Memory savings:**
- Original: d × 4 bytes (float32)
- PQ: m bytes
- Compression ratio: 4d/m

**Use when:**
- Limited memory
- Large datasets (> 10M vectors)
- Can accept ~90-95% accuracy

### IndexIVFPQ - IVF + PQ combined

```python
# Best for very large datasets
nlist = 4096
m = 8
nbits = 8

quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, nbits)

# Train
index.train(vectors)
index.add(vectors)

# Search
index.nprobe = 32
distances, indices = index.search(query, k)
```

**Use when:**
- Dataset > 10M vectors
- Need fast search + low memory
- Can accept 90-95% accuracy

## GPU indices

### Single GPU

```python
import faiss

# Create CPU index
index_cpu = faiss.IndexFlatL2(d)

# Move to GPU
res = faiss.StandardGpuResources()  # GPU resources
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)  # GPU 0

# Use normally
index_gpu.add(vectors)
distances, indices = index_gpu.search(query, k)
```

### Multi-GPU

```python
# Use all available GPUs
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

# Or specific GPUs
gpus = [0, 1, 2, 3]  # Use GPUs 0-3
index_gpu = faiss.index_cpu_to_gpus_list(index_cpu, gpus)
```

**Speedup:**
- Single GPU: 10-50× faster than CPU
- Multi-GPU: Near-linear scaling

## Index factory

```python
# Easy index creation with string descriptors
index = faiss.index_factory(d, "IVF100,Flat")
index = faiss.index_factory(d, "HNSW32")
index = faiss.index_factory(d, "IVF4096,PQ8")

# Train and use
index.train(vectors)
index.add(vectors)
```

**Common descriptors:**
- `"Flat"`: Exact search
- `"IVF100,Flat"`: IVF with 100 clusters
- `"HNSW32"`: HNSW with M=32
- `"IVF4096,PQ8"`: IVF + PQ compression

## Performance comparison

### Search speed (1M vectors, k=10)

| Index | Build Time | Search Time | Memory | Recall |
|-------|------------|-------------|--------|--------|
| Flat | 0s | 50ms | 512 MB | 100% |
| IVF100 | 5s | 2ms | 512 MB | 95% |
| HNSW32 | 60s | 1ms | 1GB | 99% |
| IVF4096+PQ8 | 30s | 3ms | 32 MB | 90% |

*CPU (16 cores), 128-dim vectors*

## Best practices

1. **Start with Flat** - Baseline for comparison
2. **Use IVF for medium datasets** - Good balance
3. **Use HNSW for best quality** - If memory allows
4. **Add PQ for memory savings** - Large datasets
5. **GPU for > 100K vectors** - 10-50× speedup
6. **Tune nprobe/efSearch** - Trade-off speed/accuracy
7. **Train on representative data** - Better clustering
8. **Save trained indices** - Avoid retraining

## Resources

- **Wiki**: https://github.com/facebookresearch/faiss/wiki
- **Paper**: https://arxiv.org/abs/1702.08734

```