SkillHub ClubRun DevOpsFull StackBackendDevOps

qdrant

Qdrant vector database: collections, points, payload filtering, indexing, quantization, snapshots, and Docker/Kubernetes deployment.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C1.4

Composite score

1.4

Best-practice grade

C67.6

Install command

npx @skill-hub/cli install itechmeat-llm-code-qdrant

vector-databaseqdrantsearchdeploymentapi

Repository

itechmeat/llm-code

Skill path: skills/qdrant

Qdrant vector database: collections, points, payload filtering, indexing, quantization, snapshots, and Docker/Kubernetes deployment.

Open repository

Best for

Primary workflow: Run DevOps.

Technical facets: Full Stack, Backend, DevOps.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: itechmeat.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install qdrant into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/itechmeat/llm-code before adding qdrant to shared team environments
Use qdrant for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: qdrant
description: "Qdrant vector database: collections, points, payload filtering, indexing, quantization, snapshots, and Docker/Kubernetes deployment."
version: "1.16.3"
release_date: "2025-12-19"
---

# Qdrant (Skill Router)

This file is intentionally **introductory**.

It acts as a **router**: based on your situation, open the right note under `references/`.

## Start here (fast)

- New to Qdrant? Read: `references/concepts.md`.
- Want the fastest local validation? Read: `references/quickstart.md` + `references/deployment.md`.
- Integrating with Python? Read: `references/api-clients.md`.

## Choose by situation

### Data modeling

- What should go into vectors vs payload vs your main DB? Read: `references/modeling.md`.
- Working with IDs, upserts, and write semantics? Read: `references/points.md`.
- Need to understand payload types and update modes? Read: `references/payload.md`.

### Retrieval (search)

- One consolidated entry point (search + filtering + explore + hybrid): `references/retrieval.md`.

### Performance & indexing

- Index types and tradeoffs: `references/indexing.md`.
- Storage/optimizer internals that matter operationally: `references/storage.md` + `references/optimizer.md`.
- Practical tuning, monitoring, troubleshooting: `references/ops-checklist.md`.

### Deployment & ops

- Installation/Docker/Kubernetes: `references/deployment.md`.
- Configuration layering: `references/configuration.md`.
- Security/auth/TLS boundary: `references/security.md`.
- Backup/restore: `references/snapshots.md`.

### API interface choice

- REST vs gRPC, Python SDK: `references/api-clients.md`.

## How to maintain this skill

- Keep `SKILL.md` short (router + usage guidance).
- Put details into `references/*.md`.
- Merge or reorganize references when it improves discoverability.

## Critical prohibitions

- Do not ingest/quote large verbatim chunks of vendor docs; summarize in your own words.
- Do not invent defaults not explicitly grounded in documentation; record uncertainties as TODOs.
- Do not design backup/restore without testing a restore path.
- Do not use NFS as the primary persistence backend (installation docs explicitly warn against it).
- Do not expose internal cluster communication ports publicly; rely on private networking.
- Do not use API keys/JWT over untrusted networks without TLS.
- Do not rely on implicit runtime defaults for production; record effective configuration.

## Links

- Concepts: https://qdrant.tech/documentation/concepts/
- Installation: https://qdrant.tech/documentation/guides/installation/


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/concepts.md

```markdown
# Qdrant Concepts — overview (ingested: concepts landing page)

Source: https://qdrant.tech/documentation/concepts/

This note summarizes the **Concepts landing page** only. It is a navigation/terminology map rather than a deep technical spec.

## Concept map (from the docs TOC)

- **Collections**: named datasets that contain points.
- **Points**: the main record type; a point contains vector(s) and optional payload.
- **Payload**: metadata stored alongside vectors.
- **Search**: similarity search (retrieve nearest points in vector space).
- **Explore**: a set of APIs for exploring the collection beyond basic similarity search.
- **Hybrid Queries**: multi-stage or multi-query retrieval patterns.
- **Filtering**: database-style conditions and clauses.
- **Inference**: generating vectors (embeddings) from text or images.
- **Optimizer**: mechanisms to rebuild/optimize internal DB structures for faster search.
- **Storage**: segments, indexes, and ID mapping at a high level.
- **Indexing**: available index types (payload, vector, sparse vector, filterable).
- **Snapshots**: node-level backup/restore artifacts.

## What this page does NOT fully explain

The landing page itself does not provide detailed technical specifics on:
- vector distance metrics
- ANN index internals (e.g., HNSW)
- distributed topology (sharding/replication)
- security/auth/TLS

Those likely live in the linked sub-pages. Fetch them **one by one** and extend this skill.

## Next ingestion targets (sub-pages)

Recommended order:
1) Collections → 2) Points → 3) Payload → 4) Filtering → 5) Search → 6) Indexing → 7) Storage/Optimizer → 8) Snapshots

(Each should be ingested as a separate URL, with its own reference note if needed.)

```

### references/quickstart.md

```markdown
# Qdrant Quickstart — overview (ingested: local quickstart)

Source: https://qdrant.tech/documentation/quickstart/

This note summarizes the **Local Quickstart** page. The goal is not to mirror code samples, but to capture the practical workflow and gotchas.

## High-value warning (repeatable)

- Quickstart explicitly warns: by default Qdrant can start **without encryption or authentication**.
- Practical rule: treat a default quickstart instance as **local-only** unless you’ve applied the Security guidance (API keys + TLS + network isolation).

## Minimal local run (what matters)

- Run Qdrant in Docker with:
  - REST endpoint (HTTP)
  - gRPC endpoint
  - persistent storage mounted to `/qdrant/storage`

The quickstart calls out that on some platforms (notably Windows setups) a named Docker volume may be safer than host folder mounts.

## Local endpoints you can rely on

- REST API is available on the HTTP port.
- Web UI dashboard is served on the same HTTP endpoint under `/dashboard`.
- gRPC API is exposed on its own port.

## Minimal sanity-check workflow (portable)

1) Create a collection
- Requires specifying vector dimensionality and a distance function.

2) Upsert points
- Points include an ID, vector values, and optional payload.

3) Query / search
- Basic similarity query returns scored point IDs.
- Payload is not always returned by default; request it explicitly if you need it.

4) Filtered search
- Filtering is applied over payload fields.
- Quickstart recommends: create payload indexes for performance on real datasets.

## Next ingestion targets (one URL at a time)

- Payload indexing page (to capture what “payload index” means and how to design it)
- Filtering page (operators, types, and performance implications)

```

### references/deployment.md

```markdown
# Deployment (Installation, Docker, Kubernetes)

Sources:
- https://qdrant.tech/documentation/guides/installation/

This note consolidates the practical deployment constraints and options.

## Recommended paths (high level)

- Production:
  - Qdrant Cloud (managed)
  - Kubernetes (Helm chart or enterprise operator, depending on requirements)
- Development/testing:
  - Docker (single container) or Docker Compose

## Storage constraints (high value)

- Qdrant persistence expects **block-level access** with a **POSIX-compatible filesystem**.
- **NFS is not supported** for Qdrant storage.
- SSD/NVMe is recommended for vector-heavy workloads.
- Be careful with Windows Docker/WSL mounts (docs warn about filesystem issues / data loss).

## Networking / ports

- `6333`: HTTP API (and health/metrics endpoints)
- `6334`: gRPC API
- `6335`: distributed deployment / cluster communication

Operational rule of thumb:
- Clients typically need `6333`/`6334`.
- Cluster nodes must reach each other on all required ports.

## Docker quickstart (practical)

Pull:

```bash
docker pull qdrant/qdrant
```

Run with persistence:

```bash
docker run -p 6333:6333 \
    -v $(pwd)/path/to/data:/qdrant/storage \
    qdrant/qdrant
```

Override config:

```bash
docker run -p 6333:6333 \
    -v $(pwd)/path/to/data:/qdrant/storage \
    -v $(pwd)/custom_config.yaml:/qdrant/config/production.yaml \
    qdrant/qdrant
```

## Kubernetes (Helm chart) notes

- Helm chart is community-supported.
- The docs highlight limitations compared to Qdrant Cloud/enterprise operator:
  - no zero-downtime upgrades
  - no automatic shard rebalancing
  - no full backup/recovery automation

If you self-host on K8s, you must design:
- backup/restore
- upgrades
- monitoring/logging
- HA + load balancing

## Production checklist (minimum)

- Persistent storage is configured and compatible (no NFS).
- Network exposure is intentional (do not expose internal cluster comms publicly).
- Security boundary is defined (auth + TLS termination).
- Monitoring and backups are in place.

```

### references/api-clients.md

```markdown
# API Clients (REST, gRPC, Python SDK)

Qdrant API interfaces and client library patterns.

## Interfaces

| Protocol | Port | Use Case |
|----------|------|----------|
| REST | 6333 | Development, debugging, human-readable |
| gRPC | 6334 | Production, high throughput, lower latency |

**Recommendation**: Start with REST for prototyping, switch to gRPC for production performance.

## Python SDK

```bash
pip install qdrant-client
# Optional: local embeddings
pip install qdrant-client[fastembed]
```

### Sync Client

```python
from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")
```

### Async Client

```python
from qdrant_client import AsyncQdrantClient

async_client = AsyncQdrantClient(url="http://localhost:6333")
```

### Connection Options

- **Local/memory**: `QdrantClient(":memory:")`
- **Remote**: `QdrantClient(url="http://host:6333")`
- **Cloud**: `QdrantClient(url="https://your-cluster.qdrant.cloud", api_key="...")`

## Key Features

- Type definitions for all Qdrant API
- Sync and async requests
- Helper methods for common operations
- Supports REST and gRPC protocols

## Docker Port Exposure

```bash
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
```

## gRPC + Multiprocessing Gotcha

**Error**: `sendmsg: Socket operation on non-socket (88)` when using multiprocessing with gRPC.

**Cause**: multiprocessing copies gRPC channels, sharing sockets; parent close breaks children.

**Fix**:
```python
import multiprocessing
multiprocessing.set_start_method("forkserver")  # or "spawn"
```

**Alternative**: Use REST API, async client, or built-in parallelization (`qdrant.upload_points(...)`).

```

### references/modeling.md

```markdown
# Data Modeling & Inference

Best practices for structuring data in Qdrant and server-side embedding generation.

## What to Store Where

| Component | Purpose | Notes |
|-----------|---------|-------|
| **Vectors** | Similarity search | Dense, sparse, or multi-vector |
| **Payload** | Filtering/metadata | JSON-like, index for performance |
| **External DB** | Full content | Store IDs in payload for retrieval |

## Modeling Best Practices

- Keep payload lightweight; use for filtering, not full data storage
- Index payload fields used in filters (tags, timestamps, tenant IDs)
- Use named vectors for multiple embeddings per point (e.g., text + image)
- Balance vector dimensionality: higher for accuracy, lower for speed/memory

## Multi-Tenancy Patterns

1. **Separate collections**: One collection per tenant (simpler isolation)
2. **Shared collection + tenant ID**: Filter by tenant in payload (requires indexing)

## Common Patterns

| Pattern | Vectors | Payload |
|---------|---------|---------|
| **RAG** | Chunk embeddings | Source doc ID, chunk index |
| **Recommendations** | User/item vectors | Preferences, categories |
| **Hybrid Search** | Dense + sparse | Reranking scores |

---

## Inference (Server-Side Embeddings)

Qdrant can generate embeddings directly, avoiding external pipelines.

### Inference Objects

Replace raw vectors with inference objects in API calls:

```json
// Text embedding
{ "text": "search query", "model": "model-name" }

// Image embedding
{ "image": "https://example.com/image.jpg", "model": "clip-model" }
```

### BM25 Sparse Vectors

```json
{ "text": "document text", "model": "qdrant/bm25" }
```

### Inference Sources

| Source | Setup | Example Model |
|--------|-------|---------------|
| **Qdrant Cloud** | Built-in | Check console for models |
| **Local (fastembed)** | `cloud_inference=False` | Local models |
| **External (OpenAI/Cohere)** | Prepend provider, add API key | `openai/text-embedding-3-small` |

### Advanced Features

- **Multiple vectors**: Generate dense + sparse per point
- **Matryoshka reduction**: `"mrl": 64` for dimension reduction
- **Optimization**: Identical inference objects computed once per request

### Practical Notes

- Input text not stored unless explicitly added to payload
- For Cloud: Check model dimensionality/context window in console

```

### references/points.md

```markdown
# Qdrant Points — overview (ingested: points concept page)

Source: https://qdrant.tech/documentation/concepts/points/

This note summarizes the **Points** concept page, focusing on how point writes/updates behave in practice.

## What a point is

- A point is the central record in Qdrant.
- It contains:
  - an ID
  - one or more vector representations
  - optional payload (metadata)

## IDs (design choice)

- The docs state Qdrant supports point IDs as:
  - 64-bit unsigned integers
  - UUIDs (multiple string formats are accepted)

Practical guidance:
- Prefer UUIDs if IDs come from outside your system or you need low collision risk.
- Prefer integers for compactness when you control ID assignment.

## Write path semantics (important)

- Point modification operations are described as **asynchronous** and written to a write-ahead log first.
- This implies a “durable but not immediately visible” window depending on whether you wait for completion.

### `wait` / eventual consistency (high value)

- If you do not request waiting, you can receive an acknowledgment before the update is fully applied.
- If you need the update to be searchable immediately after the call returns, you must use the “wait for completion” mode.

Practical rule:
- For ingestion pipelines that can tolerate lag, async is fine.
- For request/response flows where the user expects immediate retrieval, use wait mode.

## Upsert / idempotence

- The docs describe APIs as idempotent: re-sending the same upsert leads to the same final state.
- Points with the same ID are overwritten when re-uploaded.

Practical rule:
- Safe for “at-least-once” delivery pipelines (queues) as long as overwrites are acceptable.

## Vectors model

- A point can have multiple vectors, including different types; Qdrant supports:
  - dense vectors
  - sparse vectors
  - multivectors
- Multiple vectors per point are referred to as named vectors.

### Named vectors replacement vs partial updates

- Uploading a point with an existing ID replaces the whole point (unspecified vectors can be removed).
- There is a dedicated “update vectors” operation to update only the specified vectors while keeping the others unchanged.

## Batch ingestion

- The page describes two batch formats:
  - record-oriented (list of points)
  - column-oriented (ids/payloads/vectors arrays)

Practical rule:
- Choose whichever fits your ETL shape; they’re equivalent internally.

## Python client ingestion helpers

- The page highlights Python client helpers that can:
  - parallelize uploads
  - retry
  - batch lazily (useful for streaming from disk)

## Conditional updates (optimistic concurrency)

- Update operations can include a filter-based precondition.
- This can implement optimistic concurrency control (e.g., only update if payload `version` matches).

Practical rule:
- Use conditional updates for background re-embedding jobs to prevent overwriting fresh application writes.

## Retrieval patterns (useful for apps)

- Retrieve by IDs (selective fetch)
- Scroll (iterate by ID order; filterable)
  - Ordering by payload key exists but requires an appropriate payload index; pagination changes when using order_by.
- Count by filter (useful for analytics and pagination sizing)

## Next ingestion targets (one URL at a time)

- Payload page (to connect “update payload / overwrite payload” semantics)
- Vectors page (to cover vector storage and optimization)

```

### references/payload.md

```markdown
# Qdrant Payload — overview (ingested: payload concept page)

Source: https://qdrant.tech/documentation/concepts/payload/

This note captures how payload (metadata) behaves, what types are filterable, and which update operations matter.

## What payload is

- Payload is JSON metadata stored alongside vectors.
- Payload is central to:
  - filtering (constraints)
  - faceting / aggregations (counts)
  - application-level semantics (e.g., access control fields, timestamps, categories)

## Filterable payload types (what Qdrant expects)

The page documents payload types that participate in filtering:
- integer (64-bit)
- float (64-bit)
- bool
- keyword (string)
- geo (lon/lat object)
- datetime (RFC 3339 variants; UTC assumed if timezone missing)
- uuid (functionally similar to keyword, but stored as parsed UUID internally and can reduce RAM in payload-heavy setups)

Array semantics (high value):
- if a payload field is an array, a filter succeeds if **any element** satisfies the condition.

Practical rule:
- Keep payload types consistent per field; mismatched type means the condition is treated as not satisfied.

## Write patterns: attach payload at upsert

- Payload can be included during point upsert.
- Arrays are supported for multi-valued metadata.

## Updating payload: choose the right operation

The page distinguishes:
- **Set payload**: update only provided fields, keep others unchanged.
- **Overwrite payload**: replace the entire payload.
- **Clear payload**: remove all payload keys.
- **Delete payload keys**: remove only specific keys.

Selection patterns:
- by explicit point IDs
- by filter selector (bulk updates without knowing IDs)

Nested update convenience:
- the guide mentions a `key` parameter that allows modifying only a nested object under a particular top-level key.

## Payload indexing (practical guidance)

- For efficient filtered search, create indexes for payload fields (type-specific).
- The page recommends indexing fields that constrain results the most (often high-cardinality identifiers), and using the most restrictive index first in compound filters.

## Facet counts (useful for UX and query planning)

- Faceting is a GROUP BY-like counting aggregation over a field.
- The page states a field must have a compatible index (e.g., keyword index for MatchValue) to facet on it.
- Result size is limited by default; can be increased with a limit.
- Counts may be approximate by default; there is an `exact` option when you need precision.

```

### references/retrieval.md

```markdown
# Retrieval (Search, Filtering, Explore, Hybrid Queries)

Sources:
- https://qdrant.tech/documentation/concepts/search/
- https://qdrant.tech/documentation/concepts/filtering/
- https://qdrant.tech/documentation/concepts/explore/
- https://qdrant.tech/documentation/concepts/hybrid-queries/

This note consolidates the practical parts of Qdrant retrieval.

## Query API as the “front door”

- Qdrant’s universal retrieval interface is the Query API:
  - `POST /collections/{collection_name}/points/query`
- Treat the `query` parameter as the thing that changes behavior (nearest, by-id, hybrid, etc.).

## Search-time knobs (recall vs latency)

Common parameters that matter operationally:
- `hnsw_ef`: higher often improves recall but increases latency.
- `exact`: disables ANN (can be very slow; full scan).
- `indexed_only`: can protect latency during indexing but may return partial results.

## Result projection

- Results do not necessarily include payload/vectors by default.
- Use `with_payload` / `with_vectors` and projection (include/exclude fields) when you need them.

## Filtering model (boolean logic)

Filters are composed with:
- `must` (AND)
- `should` (OR)
- `must_not` (NOT)

Field conditions include:
- equality / IN / NOT IN (keyword/int/bool)
- numeric ranges
- datetime ranges (RFC 3339)
- geo filters
- array length (“values count”)
- empty/null semantics
- `has_id` and `has_vector`

### Nested arrays: correctness gotcha

If you filter arrays of objects and need multiple conditions to apply to the **same element**, use nested filtering patterns; otherwise you may accidentally match across different array elements.

## Explore (recommendation / discovery)

Use Explore when you need:
- recommendations from multiple positives and/or negatives
- discovery / context constrained search
- dataset exploration (e.g., outliers)

Operational notes:
- performance often scales with number of examples
- accuracy may require increasing `ef` for constrained discovery/context searches

## Hybrid and multi-stage retrieval

Qdrant supports multi-stage retrieval via `prefetch`:
- prefetch generates candidate sets
- the main query re-scores/ranks candidates

Important gotcha:
- `offset` applies only to the main query; ensure prefetch limits are large enough.

### Fusion patterns

When combining multiple channels (dense + sparse, or multiple embeddings):
- RRF (rank fusion) is a common default.
- Distribution-based score fusion (DBSF) can help when score scales differ.

### Diversity (MMR)

MMR helps reduce near-duplicate results; results may be ordered by selection process, not strictly by similarity score.

### Formula rescoring

Use formula-based rescoring to blend business signals (payload fields) with vector scores.
Rule of thumb: treat formula rescoring as a controlled, eval-driven feature (not a default).

## Practical rules of thumb

- Start simple: vector search + filter + payload projection.
- Add grouping/dedup only when needed (and index the group field).
- Add hybrid/multi-stage only when you can justify it with eval + latency budgets.

```

### references/indexing.md

```markdown
# Qdrant Indexing — overview (ingested: indexing concept page)

Source: https://qdrant.tech/documentation/concepts/indexing/

This note summarizes the indexing model and the handful of decisions that most teams actually need.

## Mental model (how to think about indexes)

- Qdrant combines **vector indexes** (for similarity search) with **payload indexes** (for filtering and query planning).
- Index configuration is applied at the **collection** level, but indexes may be built per-segment as data grows and optimizers decide it’s worthwhile.

## Payload indexes (what you need for fast filters)

- Payload indexes are created per field and type; they speed up filtering and help estimate filter selectivity.
- Index only fields you filter on frequently; indexes cost memory and build time.
- A practical heuristic from the guide: indexing fields with more distinct values often yields more benefit.

Supported payload index types mentioned include:
- keyword / integer / float / bool
- datetime
- uuid (doc notes this can be more memory-efficient than keyword for UUIDs)
- geo
- text (full-text)

### Parameterized integer index (performance trap)

- Integer indexes can be configured to support “lookup” (exact match) and/or “range”.
- The guide warns that enabling lookup in the wrong context can cause performance issues.

### On-disk payload indexes (memory vs latency)

- Default: payload-related structures are kept in memory for low latency.
- On-disk payload index exists for large/rarely used indexes to reduce memory pressure.
- Tradeoff: cold requests may be slower due to disk I/O.

### Tenant index / principal index (special-purpose)

- Tenant index: optimizes multi-tenant collections when most queries filter by tenant.
- Principal index: optimizes when most queries filter by a primary “timeline” field (e.g., timestamp).

## Full-text index (text filtering semantics)

- Full-text indexing enables token-based filtering on string payload.
- Key design choices:
  - tokenizer (word/whitespace/prefix/multilingual)
  - lowercasing / ASCII folding
  - stemming / stopwords (language-specific)
  - phrase matching (requires additional structure; enable explicitly)

Practical rule: text filter semantics depend on how you build the full-text index.

## Vector index (dense)

- The guide states dense vectors use an HNSW index.
- Parameters you’ll see:
  - `m` (graph degree)
  - `ef_construct` (build quality/speed)
  - `ef` (search-time quality/latency)
  - `full_scan_threshold` (when to skip HNSW)

Practical rule: don’t tune HNSW blindly — benchmark on your data.

## Sparse vector index

- Designed for sparse vectors (many zeros), conceptually closer to inverted-index style retrieval.
- Can be stored on disk to save memory, with expected latency tradeoffs.
- Supports dot-product similarity (as described in the guide).

## Filterable index / graph-filter interaction

- The guide describes additional mechanisms to keep graph traversal effective under filtering.
- Practical takeaway: the combination of vector search + filters has specific index support; strict multi-filter combinations may require special search algorithms.

## What to enforce in projects (portable)

- Treat payload indexes as mandatory for production filtered search.
- Prefer least number of indexed fields, chosen from actual query patterns.
- Decide early whether multi-tenancy will be “one collection per tenant” vs “shared collection + tenant index”.
- Document whether text filters require phrase semantics (and configure phrase matching accordingly).


```

### references/storage.md

```markdown
# Storage (Qdrant Concepts) — practical notes

Source: https://qdrant.tech/documentation/concepts/storage/

## Segment model (what to remember)

- A collection’s data is split into **segments**.
- Each segment has its own:
  - vector storage
  - payload storage
  - vector + payload indexes
  - ID mapper (internal ↔ external IDs)
- Segments usually do not overlap; if a point ends up in multiple segments, Qdrant has **deduplication** in search.

Appendable vs non-appendable:
- Segments can be **appendable** or **non-appendable** depending on storage/index choices.
- Appendable segments allow add/delete/query.
- Non-appendable segments allow read/delete only.
- A collection must have at least one appendable segment.

Why this matters operationally:
- Many performance behaviors (optimizer, indexing, memmap) are segment-scoped.

## Vector storage: In-memory vs Memmap (on-disk)

Qdrant provides two main vector storage modes:

- **In-memory**: vectors live in RAM; fastest for search; disk mostly used for persistence.
- **Memmap (on-disk)**: vectors live in memory-mapped files; OS page cache controls what is resident.
  - With enough RAM, it can be close to in-memory performance.
  - Typically preferred for large collections when RAM is limited and disks are fast.

### How to enable memmap

Two main approaches:

1) Collection creation: set `vectors.on_disk=true`.
- Recommended when you know upfront you want memmap for the whole collection.

2) Threshold-based conversion: set `memmap_threshold`.
- Can be configured globally and/or per collection.
- Segments above the threshold are converted to memmap storage.

Rule of thumb (from docs):
- Balanced workload: set `memmap_threshold` ≈ `indexing_threshold` (default mentioned as 20000 in docs).
  - This helps avoid extra optimizer runs by aligning thresholds.
- High write load + low RAM: set `memmap_threshold` lower than `indexing_threshold` (e.g. 10000).
  - Converts to memmap earlier; indexing happens later.

### HNSW index on disk

- You can also store the HNSW index on disk using `hnsw_config.on_disk=true` (per collection create/update).

Practical implication:
- “Vectors on disk” and “HNSW on disk” are separate knobs; decide per workload and disk speed.

## Payload storage: InMemory vs OnDisk

Payload storage types:
- **InMemory payload**: payload data loaded into RAM on startup; persistent backing on disk (and Gridstore per docs).
  - Fast, but can consume a lot of RAM for large payload values (long text, images).
- **OnDisk payload**: payload read/write directly to RocksDB.
  - Lower RAM usage, but higher access latency.

Critical performance rule:
- If you filter/search using payload conditions and payload is on disk, create **payload indexes** for the fields used in filters.
- Once a payload field is indexed, Qdrant keeps values of that indexed field in RAM **regardless** of payload storage type.

How to choose (practical):
- Large payload values that you don’t filter on → consider on-disk payload.
- Any payload fields used in filters/scoring → index them.

## Versioning + WAL (crash safety)

Qdrant uses a two-stage write path for integrity:

1) Write to **WAL** (write-ahead log): orders operations and assigns sequential numbers.
2) Apply changes to segments.

Each segment tracks:
- the last applied version
- per-point version

If an operation’s sequence number is older than the current point version, it is ignored.

Operational implication:
- WAL enables safe recovery after abnormal shutdown.
- Versioning prevents out-of-order updates from corrupting point state.

## Operational guidelines

- Prefer memmap vectors + (optional) on-disk HNSW when collections grow beyond RAM.
- Keep filter-critical payload fields indexed; avoid "disk payload + unindexed filters".
- Bulk ingestion workflows should align `memmap_threshold` and indexing thresholds.

```

### references/optimizer.md

```markdown
# Optimizer (Qdrant Concepts) — practical notes

Source: https://qdrant.tech/documentation/concepts/optimizer/

## Why optimizer exists (mental model)

Qdrant stores data in **segments**. Many changes are more efficient in **batches** than “in-place per point”, so Qdrant periodically **rebuilds** internal structures at segment level.

Key availability property:
- The segment being optimized remains **readable** during rebuild.
- Writes/updates during optimization go into a **copy-on-write** segment (proxy layer), which takes priority for reads and subsequent updates.

Practical implication:
- Optimization is expected background work. Plan for CPU/Disk IO spikes and don’t treat it as an outage.

## Vacuum optimizer (garbage of deleted points)

Deletion is logical first:
- Qdrant marks records as deleted and ignores them in queries.
- This minimizes disk IO, but over time deleted records accumulate → memory usage and performance can degrade.

Vacuum optimizer triggers when a segment accumulates “too many” deletions.

Relevant config knobs:
- `storage.optimizers.deleted_threshold`: minimal fraction of deleted vectors in a segment to start vacuum.
- `storage.optimizers.vacuum_min_vector_number`: minimal vectors in a segment before vacuum makes sense.

Operational guidance:
- If you do frequent deletes (e.g., reingestion, dedup), watch for vacuum activity and disk usage.

## Merge optimizer (too many small segments)

Too many small segments hurt search performance.
Merge optimizer tries to reduce segment count:
- Target segment count: `storage.optimizers.default_segment_number` (defaults to CPU count when 0).
- It merges (at least) the smallest segments.
- It avoids creating overly large segments via `storage.optimizers.max_segment_size_kb`.

Tradeoff note from docs:
- Lower `max_segment_size_kb` can prioritize faster indexation.
- Higher `max_segment_size_kb` can prioritize search speed (fewer segments), but risks long index build times per segment.

Practical guidance:
- Treat segment count as a performance lever: fewer segments typically helps search parallelism overhead, but “too large” segments make rebuilds expensive.

## Indexing optimizer (when to turn on indexes / memmap)

Qdrant can switch storage/index modes based on dataset size. Small datasets can be faster with brute-force scan.

Indexing optimizer enables:
- vector indexing
- memmap storage
…when thresholds are reached.

Relevant config knobs:
- `storage.optimizers.memmap_threshold` (kB per segment): above this, vectors become read-only **memmap**. Set to `0` to disable.
- `storage.optimizers.indexing_threshold_kb` (kB per segment): above this, enables vector indexing. Set to `0` to disable.

Practical implication:
- These thresholds strongly affect memory vs latency behavior; choose them intentionally for your workload.

## Per-collection optimizer overrides + dynamic tuning

In addition to global config, optimizer parameters can be set **per collection**.

Docs highlight a common production pattern:
- During bulk initial load, disable indexing / expensive rebuild behavior.
- After ingestion finishes, enable indexing so the index is built once (instead of rebuilding repeatedly during upload).

## Operational guidelines

- Collections can have different lifecycles (churny vs append-only).
- Bulk backfills / re-embeddings should use the "disable indexing during upload, re-enable after" pattern to save compute.

```

### references/ops-checklist.md

```markdown
# Operations Checklist (Monitoring, Performance, Troubleshooting)

Operational guidance for Qdrant: monitoring, performance tuning, and common issues.

---

## Monitoring

### Key Endpoints

| Endpoint | Purpose | Notes |
|----------|---------|-------|
| `/metrics` | Prometheus metrics | Scrape per node |
| `/telemetry` | State info (vectors, shards) | Debugging |
| `/healthz`, `/livez`, `/readyz` | Kubernetes health | Always accessible |

### Essential Metrics

**Collections**:
- `collections_total`, `collection_points`, `collection_vectors`

**API Performance**:
- `rest_responses_total/fail_total`
- `rest_responses_duration_seconds` (histogram)

**Memory**:
- `memory_allocated_bytes`, `memory_resident_bytes`

**Process**:
- `process_open_fds`, `process_threads`

**Cluster** (distributed):
- `cluster_peers_total`, `cluster_pending_operations_total`

**Optimizations**:
- `collection_running_optimizations`

### Configuration

- Prefix metrics: `QDRANT__SERVICE__METRICS_PREFIX`
- Hardware IO: `service.hardware_reporting: true`

---

## Performance Checklist

### Scenario 1: High-Speed Search, Low Memory

- Vectors `on_disk: true`
- Scalar quantization `int8` with `always_ram: true`
- Optional: `quantization.rescore: false` (slight precision loss)

### Scenario 2: High Precision, Low Memory

- Vectors and HNSW `on_disk: true`
- Increase HNSW: `m: 64`, `ef_construct: 512`
- Use inline storage (v1.16+) with quantization
- Check disk IOPS

### Scenario 3: High Precision, High-Speed

- Keep vectors in RAM
- Scalar quantization with rescoring
- Tune search: higher `hnsw_ef`, `exact: true` for ground truth

### General Tuning

| Goal | Setting |
|------|---------|
| Minimize latency | `default_segment_number` = CPU cores |
| Maximize throughput | `default_segment_number: 2`, `max_segment_size: 5M` |

### Checklist

- [ ] Index payload fields used in filters
- [ ] Choose quantization (scalar/binary) based on precision needs
- [ ] Monitor memory/disk via `/metrics`
- [ ] Adjust HNSW params (m, ef_construct, on_disk)
- [ ] Use named vectors for multi-modal
- [ ] Run optimizer after bulk inserts

---

## Troubleshooting

### Too many files open (OS error 24)

**Cause**: Each collection segment requires open files.

**Fix**:
```bash
# Docker
docker run --ulimit nofile=10000:10000 qdrant/qdrant

# Shell
ulimit -n 10000
```

### Incompatible file system (data corruption risk)

**Cause**: Qdrant requires POSIX-compatible filesystem; non-POSIX (FUSE, HFS+, WSL mounts) can corrupt data.

**Symptoms**:
- `OutputTooSmall { expected: 4, actual: 0 }`
- Vectors zeroed after restart

**Fix**: Use Docker named volumes instead of bind mounts to Windows folders (WSL issue).

### Can't open Collections meta Wal (distributed)

**Error**: `Resource temporarily unavailable`

**Cause**: WAL files locked by another Qdrant instance (shared storage).

**Fix**: Each node must have its own storage directory. Cluster handles data sharing internally.

### gRPC + Multiprocessing Socket Error

**Error**: `sendmsg: Socket operation on non-socket (88)`

**Fix**:
```python
import multiprocessing
multiprocessing.set_start_method("forkserver")  # or "spawn"
```

Or use REST API / async client.

---

## Quick Fixes Summary

| Issue | Fix |
|-------|-----|
| File limit errors | `--ulimit nofile=10000:10000` |
| Data corruption on WSL | Use Docker named volumes |
| Slow filtered search | Index payload fields |
| High memory usage | Enable `on_disk` for vectors/HNSW |
| Low recall | Increase `hnsw_ef`, `ef_construct` |

```

### references/configuration.md

```markdown
# Qdrant Configuration — overview (ingested: configuration guide)

Source: https://qdrant.tech/documentation/guides/configuration/

This note summarizes the **Configuration guide** with an emphasis on patterns you’ll actually use in real deployments.

## How configuration is supplied (practical)

- Qdrant supports file-based configuration and environment variable overrides.
- File formats mentioned: YAML (commonly used), and also TOML/JSON/INI.
- Environment variables have the **highest priority**.

### Env var mapping pattern (high value)

- Prefix: `QDRANT__`
- Nested keys are separated by double underscores.
  - Example: `QDRANT__SERVICE__API_KEY=...`

## Precedence / load order (high value)

The guide describes a layered override model (least → most significant):
1) embedded defaults
2) `config/config.yaml`
3) `config/{RUN_MODE}.yaml`
4) `config/local.yaml`
5) explicit `--config-path` file (overrides other files)
6) environment variables (override everything)

Practical pattern:
- keep stable defaults in file-based config
- keep secrets and env-specific overrides in env vars or the orchestrator (Kubernetes)

## Settings that matter most in production

### Networking / service

- HTTP: typically `6333`; gRPC: typically `6334`.
- gRPC can be disabled if configured accordingly.
- Bind address/host is configurable.

### Security

- API keys and read-only API keys are config-driven.
- TLS can be enabled at the service level; optional mutual TLS is supported.
- TLS cert rotation can be handled via periodic reload (`tls.cert_ttl`).

### Storage / snapshots

- storage path and snapshots path are explicit configuration.
- snapshots can be stored locally or in S3 (requires S3 config).
- WAL has tunables (capacity/segments) that matter under write load.

### Performance (avoid premature tuning)

- Search/indexing thread controls exist; defaults are usually fine until measured.
- HNSW/index parameters are configurable; only tune with benchmarks.

### Distributed cluster

- Cluster enablement and peer-to-peer settings are configurable.
- Peer TLS can be enabled.
- Transfer limits and shard transfer methods can be configured.

## Operational recommendations (portable)

- Treat config as an explicit artifact: commit non-secret defaults, inject secrets at deploy time.
- Prefer “small number of well-understood knobs” over changing many settings without measurement.
- Validate a restart path: invalid config should fail fast at startup.


```

### references/security.md

```markdown
# Qdrant Security — overview (ingested: security guide)

Source: https://qdrant.tech/documentation/guides/security/

This note summarizes the **Security guide** with an emphasis on actionable, production-relevant controls.

## Baseline warning (high value)

- Qdrant instances can be **unsecured by default**. Do not expose a node to untrusted networks without adding security controls.

## Network model & threat surfaces

- Qdrant exposes REST and gRPC APIs and can also run in distributed mode.
- In distributed mode, there is an **internal cluster port** (not meant for public exposure). The guide highlights that **internal channels are not protected by API keys/bearer tokens**, so network isolation is mandatory.
- Practical implication: treat the cluster network as a trusted private network segment.

## Authentication options

### Static API key

- Intended as a straightforward gate for API access.
- Provided via config (`service.api_key`) or env (`QDRANT__SERVICE__API_KEY`).
- Clients send it via an `api-key` header.
- Security guide stresses: use **TLS** to avoid leaking the API key.

### Read-only API key

- Separate key for read-only access.
- Config: `service.read_only_api_key` or env: `QDRANT__SERVICE__READ_ONLY_API_KEY`.

### JWT-based RBAC

- Provides finer-grained authorization (including per-collection access).
- Enabled via `service.jwt_rbac: true` and an API key used for signing/verifying tokens.
- Operationally important: anyone who knows the signing key can generate tokens offline; key rotation invalidates existing tokens.

## TLS

- The guide describes enabling TLS for REST/gRPC and (optionally) inter-node cluster connections.
- Operational notes:
  - certificate rotation is supported by periodic reload (tunable via `tls.cert_ttl`).
  - you can also terminate TLS via a reverse proxy, but still must isolate internal cluster ports.

## Hardening patterns (practical)

- Run as non-root (unprivileged image or explicit user IDs).
- Make container root filesystem read-only where feasible; mount persistent storage separately.
- Network isolation:
  - Docker internal networks for “no public ingress/egress” patterns.
  - Kubernetes NetworkPolicy to restrict ingress/egress (while allowing required inter-node traffic).

## Concrete guidance worth enforcing in projects

- Always keep internal cluster traffic private (never expose the internal cluster port publicly).
- If using API keys/JWT, do not run without TLS unless you have a trusted, private network boundary.
- Prefer least privilege (read-only key or collection-scoped JWT) for read-heavy workloads.


```

### references/snapshots.md

```markdown
# Snapshots (Qdrant Concepts) — practical notes

Source: https://qdrant.tech/documentation/concepts/snapshots/

## What a snapshot is (and what it is not)

- A snapshot is a **tar archive** containing the **data + collection configuration** for a specific collection **on a specific node** at a specific time.
- In a **distributed** deployment, you must create snapshots **per node** for the same collection (each node only has its local shard data).
- Collection-level snapshots **do not include aliases**; handle aliases separately.
- Qdrant Cloud has “Backups” as a disk-level alternative; snapshots are still useful for OSS/self-hosted workflows.

## Collection snapshots: create / list / delete / download

Core endpoints:
- Create: `POST /collections/{collection_name}/snapshots` (synchronous; generates a `.snapshot` file in `snapshots_path`).
- List: `GET /collections/{collection_name}/snapshots`
- Delete: `DELETE /collections/{collection_name}/snapshots/{snapshot_name}`
- Download: `GET /collections/{collection_name}/snapshots/{snapshot_name}` (REST-only per docs).

Practical implications:
- Treat snapshot creation as an **IO-heavy operation**; plan disk space and timing.
- In a cluster, coordinate per-node snapshot creation if you need a consistent point-in-time capture.

## Restore constraints (version + topology)

- A snapshot can only be restored into a cluster that shares the **same minor version**.
  - Example from docs: `v1.4.1` → `v1.4.x` with `x >= 1`.

## Restore methods (and when to use which)

Qdrant supports three restoration paths:

1) **Recover from URL or local file** (`PUT /collections/{collection_name}/snapshots/recover`)
- `location` can be:
  - an HTTP(S) URL reachable from the restoring node, or
  - a `file:///...` URI to a local snapshot file.
- If the target collection does not exist, Qdrant will create it.
- Cloud note: restoring from a URL is not supported if outbound traffic is blocked; use file URI or upload.

2) **Recover from uploaded snapshot** (`POST /collections/{collection_name}/snapshots/upload?priority=...`)
- Upload snapshot bytes as multipart; recommended for migrations.
- Consider setting `priority=snapshot` for migration use-cases.

3) **Recover during start-up** (Qdrant CLI flags)
- Single-node only (not multi-node, not Cloud).
- Start Qdrant with repeated `--snapshot <path>:<target_collection>` pairs.
- The target collection must be **absent**, otherwise Qdrant exits with an error.
- `--force_snapshot` overwrites existing collections; treat as a dangerous operation.

## Snapshot recovery priority (critical gotcha)

When restoring onto a non-empty node, conflicts are resolved by `priority`:

- `replica` (default): prefer existing data over snapshot.
- `snapshot`: prefer snapshot over existing data.
- `no_sync`: restore without extra synchronization (advanced; easy to break the cluster).

Important gotcha:
- To recover a **new collection** from a snapshot, you typically need `priority=snapshot`.
  - With the default `replica` priority, the docs note you can end up with an **empty collection** if the system prefers the “existing” (empty) state.

## Full storage snapshots (single-node only)

- Full storage snapshots capture **whole storage**, including **collection aliases**.
- They are **not suitable for distributed mode**.
- They can be created/downloaded in Cloud, but Cloud cannot be restored from a full storage snapshot because that requires the CLI.

Endpoints:
- Create: `POST /snapshots`
- List: `GET /snapshots`
- Delete: `DELETE /snapshots/{snapshot_name}`
- Download: `GET /snapshots/{snapshot_name}` (REST-only per docs)

Restore:
- CLI at startup: `./qdrant --storage-snapshot /path/to/full.snapshot`

## Snapshot storage configuration (paths, temp, S3)

Local filesystem defaults:
- Default snapshot dir: `./snapshots` (or `/qdrant/snapshots` inside the Docker image).

Config knobs:
- `storage.snapshots_path` (env: `QDRANT__STORAGE__SNAPSHOTS_PATH`)
- `storage.temp_path` (optional separate temp dir for snapshot creation; useful if the storage disk is slow or space-constrained)

S3 support (S3-compatible):
- Configure `storage.snapshots_config` with `snapshots_storage: s3` and `s3_config` (bucket/region/access_key/secret_key/endpoint_url).

## Operational guidelines

- For multi-tenant setups (one collection per tenant), snapshots are naturally scoped per collection.
- Choose between collection-level snapshot (per-collection backup/restore) vs full storage snapshot (single-node only).
- For self-hosted clusters, plan per-node snapshot creation/restore behavior.

```