qdrant
Qdrant vector database: collections, points, payload filtering, indexing, quantization, snapshots, and Docker/Kubernetes deployment.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install itechmeat-llm-code-qdrant
Repository
Skill path: skills/qdrant
Qdrant vector database: collections, points, payload filtering, indexing, quantization, snapshots, and Docker/Kubernetes deployment.
Open repositoryBest for
Primary workflow: Run DevOps.
Technical facets: Full Stack, Backend, DevOps.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: itechmeat.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install qdrant into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/itechmeat/llm-code before adding qdrant to shared team environments
- Use qdrant for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: qdrant
description: "Qdrant vector database: collections, points, payload filtering, indexing, quantization, snapshots, and Docker/Kubernetes deployment."
version: "1.16.3"
release_date: "2025-12-19"
---
# Qdrant (Skill Router)
This file is intentionally **introductory**.
It acts as a **router**: based on your situation, open the right note under `references/`.
## Start here (fast)
- New to Qdrant? Read: `references/concepts.md`.
- Want the fastest local validation? Read: `references/quickstart.md` + `references/deployment.md`.
- Integrating with Python? Read: `references/api-clients.md`.
## Choose by situation
### Data modeling
- What should go into vectors vs payload vs your main DB? Read: `references/modeling.md`.
- Working with IDs, upserts, and write semantics? Read: `references/points.md`.
- Need to understand payload types and update modes? Read: `references/payload.md`.
### Retrieval (search)
- One consolidated entry point (search + filtering + explore + hybrid): `references/retrieval.md`.
### Performance & indexing
- Index types and tradeoffs: `references/indexing.md`.
- Storage/optimizer internals that matter operationally: `references/storage.md` + `references/optimizer.md`.
- Practical tuning, monitoring, troubleshooting: `references/ops-checklist.md`.
### Deployment & ops
- Installation/Docker/Kubernetes: `references/deployment.md`.
- Configuration layering: `references/configuration.md`.
- Security/auth/TLS boundary: `references/security.md`.
- Backup/restore: `references/snapshots.md`.
### API interface choice
- REST vs gRPC, Python SDK: `references/api-clients.md`.
## How to maintain this skill
- Keep `SKILL.md` short (router + usage guidance).
- Put details into `references/*.md`.
- Merge or reorganize references when it improves discoverability.
## Critical prohibitions
- Do not ingest/quote large verbatim chunks of vendor docs; summarize in your own words.
- Do not invent defaults not explicitly grounded in documentation; record uncertainties as TODOs.
- Do not design backup/restore without testing a restore path.
- Do not use NFS as the primary persistence backend (installation docs explicitly warn against it).
- Do not expose internal cluster communication ports publicly; rely on private networking.
- Do not use API keys/JWT over untrusted networks without TLS.
- Do not rely on implicit runtime defaults for production; record effective configuration.
## Links
- Concepts: https://qdrant.tech/documentation/concepts/
- Installation: https://qdrant.tech/documentation/guides/installation/
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/concepts.md
```markdown
# Qdrant Concepts — overview (ingested: concepts landing page)
Source: https://qdrant.tech/documentation/concepts/
This note summarizes the **Concepts landing page** only. It is a navigation/terminology map rather than a deep technical spec.
## Concept map (from the docs TOC)
- **Collections**: named datasets that contain points.
- **Points**: the main record type; a point contains vector(s) and optional payload.
- **Payload**: metadata stored alongside vectors.
- **Search**: similarity search (retrieve nearest points in vector space).
- **Explore**: a set of APIs for exploring the collection beyond basic similarity search.
- **Hybrid Queries**: multi-stage or multi-query retrieval patterns.
- **Filtering**: database-style conditions and clauses.
- **Inference**: generating vectors (embeddings) from text or images.
- **Optimizer**: mechanisms to rebuild/optimize internal DB structures for faster search.
- **Storage**: segments, indexes, and ID mapping at a high level.
- **Indexing**: available index types (payload, vector, sparse vector, filterable).
- **Snapshots**: node-level backup/restore artifacts.
## What this page does NOT fully explain
The landing page itself does not provide detailed technical specifics on:
- vector distance metrics
- ANN index internals (e.g., HNSW)
- distributed topology (sharding/replication)
- security/auth/TLS
Those likely live in the linked sub-pages. Fetch them **one by one** and extend this skill.
## Next ingestion targets (sub-pages)
Recommended order:
1) Collections → 2) Points → 3) Payload → 4) Filtering → 5) Search → 6) Indexing → 7) Storage/Optimizer → 8) Snapshots
(Each should be ingested as a separate URL, with its own reference note if needed.)
```
### references/quickstart.md
```markdown
# Qdrant Quickstart — overview (ingested: local quickstart)
Source: https://qdrant.tech/documentation/quickstart/
This note summarizes the **Local Quickstart** page. The goal is not to mirror code samples, but to capture the practical workflow and gotchas.
## High-value warning (repeatable)
- Quickstart explicitly warns: by default Qdrant can start **without encryption or authentication**.
- Practical rule: treat a default quickstart instance as **local-only** unless you’ve applied the Security guidance (API keys + TLS + network isolation).
## Minimal local run (what matters)
- Run Qdrant in Docker with:
- REST endpoint (HTTP)
- gRPC endpoint
- persistent storage mounted to `/qdrant/storage`
The quickstart calls out that on some platforms (notably Windows setups) a named Docker volume may be safer than host folder mounts.
## Local endpoints you can rely on
- REST API is available on the HTTP port.
- Web UI dashboard is served on the same HTTP endpoint under `/dashboard`.
- gRPC API is exposed on its own port.
## Minimal sanity-check workflow (portable)
1) Create a collection
- Requires specifying vector dimensionality and a distance function.
2) Upsert points
- Points include an ID, vector values, and optional payload.
3) Query / search
- Basic similarity query returns scored point IDs.
- Payload is not always returned by default; request it explicitly if you need it.
4) Filtered search
- Filtering is applied over payload fields.
- Quickstart recommends: create payload indexes for performance on real datasets.
## Next ingestion targets (one URL at a time)
- Payload indexing page (to capture what “payload index” means and how to design it)
- Filtering page (operators, types, and performance implications)
```
### references/deployment.md
```markdown
# Deployment (Installation, Docker, Kubernetes)
Sources:
- https://qdrant.tech/documentation/guides/installation/
This note consolidates the practical deployment constraints and options.
## Recommended paths (high level)
- Production:
- Qdrant Cloud (managed)
- Kubernetes (Helm chart or enterprise operator, depending on requirements)
- Development/testing:
- Docker (single container) or Docker Compose
## Storage constraints (high value)
- Qdrant persistence expects **block-level access** with a **POSIX-compatible filesystem**.
- **NFS is not supported** for Qdrant storage.
- SSD/NVMe is recommended for vector-heavy workloads.
- Be careful with Windows Docker/WSL mounts (docs warn about filesystem issues / data loss).
## Networking / ports
- `6333`: HTTP API (and health/metrics endpoints)
- `6334`: gRPC API
- `6335`: distributed deployment / cluster communication
Operational rule of thumb:
- Clients typically need `6333`/`6334`.
- Cluster nodes must reach each other on all required ports.
## Docker quickstart (practical)
Pull:
```bash
docker pull qdrant/qdrant
```
Run with persistence:
```bash
docker run -p 6333:6333 \
-v $(pwd)/path/to/data:/qdrant/storage \
qdrant/qdrant
```
Override config:
```bash
docker run -p 6333:6333 \
-v $(pwd)/path/to/data:/qdrant/storage \
-v $(pwd)/custom_config.yaml:/qdrant/config/production.yaml \
qdrant/qdrant
```
## Kubernetes (Helm chart) notes
- Helm chart is community-supported.
- The docs highlight limitations compared to Qdrant Cloud/enterprise operator:
- no zero-downtime upgrades
- no automatic shard rebalancing
- no full backup/recovery automation
If you self-host on K8s, you must design:
- backup/restore
- upgrades
- monitoring/logging
- HA + load balancing
## Production checklist (minimum)
- Persistent storage is configured and compatible (no NFS).
- Network exposure is intentional (do not expose internal cluster comms publicly).
- Security boundary is defined (auth + TLS termination).
- Monitoring and backups are in place.
```
### references/api-clients.md
```markdown
# API Clients (REST, gRPC, Python SDK)
Qdrant API interfaces and client library patterns.
## Interfaces
| Protocol | Port | Use Case |
|----------|------|----------|
| REST | 6333 | Development, debugging, human-readable |
| gRPC | 6334 | Production, high throughput, lower latency |
**Recommendation**: Start with REST for prototyping, switch to gRPC for production performance.
## Python SDK
```bash
pip install qdrant-client
# Optional: local embeddings
pip install qdrant-client[fastembed]
```
### Sync Client
```python
from qdrant_client import QdrantClient
client = QdrantClient(url="http://localhost:6333")
```
### Async Client
```python
from qdrant_client import AsyncQdrantClient
async_client = AsyncQdrantClient(url="http://localhost:6333")
```
### Connection Options
- **Local/memory**: `QdrantClient(":memory:")`
- **Remote**: `QdrantClient(url="http://host:6333")`
- **Cloud**: `QdrantClient(url="https://your-cluster.qdrant.cloud", api_key="...")`
## Key Features
- Type definitions for all Qdrant API
- Sync and async requests
- Helper methods for common operations
- Supports REST and gRPC protocols
## Docker Port Exposure
```bash
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
```
## gRPC + Multiprocessing Gotcha
**Error**: `sendmsg: Socket operation on non-socket (88)` when using multiprocessing with gRPC.
**Cause**: multiprocessing copies gRPC channels, sharing sockets; parent close breaks children.
**Fix**:
```python
import multiprocessing
multiprocessing.set_start_method("forkserver") # or "spawn"
```
**Alternative**: Use REST API, async client, or built-in parallelization (`qdrant.upload_points(...)`).
```
### references/modeling.md
```markdown
# Data Modeling & Inference
Best practices for structuring data in Qdrant and server-side embedding generation.
## What to Store Where
| Component | Purpose | Notes |
|-----------|---------|-------|
| **Vectors** | Similarity search | Dense, sparse, or multi-vector |
| **Payload** | Filtering/metadata | JSON-like, index for performance |
| **External DB** | Full content | Store IDs in payload for retrieval |
## Modeling Best Practices
- Keep payload lightweight; use for filtering, not full data storage
- Index payload fields used in filters (tags, timestamps, tenant IDs)
- Use named vectors for multiple embeddings per point (e.g., text + image)
- Balance vector dimensionality: higher for accuracy, lower for speed/memory
## Multi-Tenancy Patterns
1. **Separate collections**: One collection per tenant (simpler isolation)
2. **Shared collection + tenant ID**: Filter by tenant in payload (requires indexing)
## Common Patterns
| Pattern | Vectors | Payload |
|---------|---------|---------|
| **RAG** | Chunk embeddings | Source doc ID, chunk index |
| **Recommendations** | User/item vectors | Preferences, categories |
| **Hybrid Search** | Dense + sparse | Reranking scores |
---
## Inference (Server-Side Embeddings)
Qdrant can generate embeddings directly, avoiding external pipelines.
### Inference Objects
Replace raw vectors with inference objects in API calls:
```json
// Text embedding
{ "text": "search query", "model": "model-name" }
// Image embedding
{ "image": "https://example.com/image.jpg", "model": "clip-model" }
```
### BM25 Sparse Vectors
```json
{ "text": "document text", "model": "qdrant/bm25" }
```
### Inference Sources
| Source | Setup | Example Model |
|--------|-------|---------------|
| **Qdrant Cloud** | Built-in | Check console for models |
| **Local (fastembed)** | `cloud_inference=False` | Local models |
| **External (OpenAI/Cohere)** | Prepend provider, add API key | `openai/text-embedding-3-small` |
### Advanced Features
- **Multiple vectors**: Generate dense + sparse per point
- **Matryoshka reduction**: `"mrl": 64` for dimension reduction
- **Optimization**: Identical inference objects computed once per request
### Practical Notes
- Input text not stored unless explicitly added to payload
- For Cloud: Check model dimensionality/context window in console
```
### references/points.md
```markdown
# Qdrant Points — overview (ingested: points concept page)
Source: https://qdrant.tech/documentation/concepts/points/
This note summarizes the **Points** concept page, focusing on how point writes/updates behave in practice.
## What a point is
- A point is the central record in Qdrant.
- It contains:
- an ID
- one or more vector representations
- optional payload (metadata)
## IDs (design choice)
- The docs state Qdrant supports point IDs as:
- 64-bit unsigned integers
- UUIDs (multiple string formats are accepted)
Practical guidance:
- Prefer UUIDs if IDs come from outside your system or you need low collision risk.
- Prefer integers for compactness when you control ID assignment.
## Write path semantics (important)
- Point modification operations are described as **asynchronous** and written to a write-ahead log first.
- This implies a “durable but not immediately visible” window depending on whether you wait for completion.
### `wait` / eventual consistency (high value)
- If you do not request waiting, you can receive an acknowledgment before the update is fully applied.
- If you need the update to be searchable immediately after the call returns, you must use the “wait for completion” mode.
Practical rule:
- For ingestion pipelines that can tolerate lag, async is fine.
- For request/response flows where the user expects immediate retrieval, use wait mode.
## Upsert / idempotence
- The docs describe APIs as idempotent: re-sending the same upsert leads to the same final state.
- Points with the same ID are overwritten when re-uploaded.
Practical rule:
- Safe for “at-least-once” delivery pipelines (queues) as long as overwrites are acceptable.
## Vectors model
- A point can have multiple vectors, including different types; Qdrant supports:
- dense vectors
- sparse vectors
- multivectors
- Multiple vectors per point are referred to as named vectors.
### Named vectors replacement vs partial updates
- Uploading a point with an existing ID replaces the whole point (unspecified vectors can be removed).
- There is a dedicated “update vectors” operation to update only the specified vectors while keeping the others unchanged.
## Batch ingestion
- The page describes two batch formats:
- record-oriented (list of points)
- column-oriented (ids/payloads/vectors arrays)
Practical rule:
- Choose whichever fits your ETL shape; they’re equivalent internally.
## Python client ingestion helpers
- The page highlights Python client helpers that can:
- parallelize uploads
- retry
- batch lazily (useful for streaming from disk)
## Conditional updates (optimistic concurrency)
- Update operations can include a filter-based precondition.
- This can implement optimistic concurrency control (e.g., only update if payload `version` matches).
Practical rule:
- Use conditional updates for background re-embedding jobs to prevent overwriting fresh application writes.
## Retrieval patterns (useful for apps)
- Retrieve by IDs (selective fetch)
- Scroll (iterate by ID order; filterable)
- Ordering by payload key exists but requires an appropriate payload index; pagination changes when using order_by.
- Count by filter (useful for analytics and pagination sizing)
## Next ingestion targets (one URL at a time)
- Payload page (to connect “update payload / overwrite payload” semantics)
- Vectors page (to cover vector storage and optimization)
```
### references/payload.md
```markdown
# Qdrant Payload — overview (ingested: payload concept page)
Source: https://qdrant.tech/documentation/concepts/payload/
This note captures how payload (metadata) behaves, what types are filterable, and which update operations matter.
## What payload is
- Payload is JSON metadata stored alongside vectors.
- Payload is central to:
- filtering (constraints)
- faceting / aggregations (counts)
- application-level semantics (e.g., access control fields, timestamps, categories)
## Filterable payload types (what Qdrant expects)
The page documents payload types that participate in filtering:
- integer (64-bit)
- float (64-bit)
- bool
- keyword (string)
- geo (lon/lat object)
- datetime (RFC 3339 variants; UTC assumed if timezone missing)
- uuid (functionally similar to keyword, but stored as parsed UUID internally and can reduce RAM in payload-heavy setups)
Array semantics (high value):
- if a payload field is an array, a filter succeeds if **any element** satisfies the condition.
Practical rule:
- Keep payload types consistent per field; mismatched type means the condition is treated as not satisfied.
## Write patterns: attach payload at upsert
- Payload can be included during point upsert.
- Arrays are supported for multi-valued metadata.
## Updating payload: choose the right operation
The page distinguishes:
- **Set payload**: update only provided fields, keep others unchanged.
- **Overwrite payload**: replace the entire payload.
- **Clear payload**: remove all payload keys.
- **Delete payload keys**: remove only specific keys.
Selection patterns:
- by explicit point IDs
- by filter selector (bulk updates without knowing IDs)
Nested update convenience:
- the guide mentions a `key` parameter that allows modifying only a nested object under a particular top-level key.
## Payload indexing (practical guidance)
- For efficient filtered search, create indexes for payload fields (type-specific).
- The page recommends indexing fields that constrain results the most (often high-cardinality identifiers), and using the most restrictive index first in compound filters.
## Facet counts (useful for UX and query planning)
- Faceting is a GROUP BY-like counting aggregation over a field.
- The page states a field must have a compatible index (e.g., keyword index for MatchValue) to facet on it.
- Result size is limited by default; can be increased with a limit.
- Counts may be approximate by default; there is an `exact` option when you need precision.
```
### references/retrieval.md
```markdown
# Retrieval (Search, Filtering, Explore, Hybrid Queries)
Sources:
- https://qdrant.tech/documentation/concepts/search/
- https://qdrant.tech/documentation/concepts/filtering/
- https://qdrant.tech/documentation/concepts/explore/
- https://qdrant.tech/documentation/concepts/hybrid-queries/
This note consolidates the practical parts of Qdrant retrieval.
## Query API as the “front door”
- Qdrant’s universal retrieval interface is the Query API:
- `POST /collections/{collection_name}/points/query`
- Treat the `query` parameter as the thing that changes behavior (nearest, by-id, hybrid, etc.).
## Search-time knobs (recall vs latency)
Common parameters that matter operationally:
- `hnsw_ef`: higher often improves recall but increases latency.
- `exact`: disables ANN (can be very slow; full scan).
- `indexed_only`: can protect latency during indexing but may return partial results.
## Result projection
- Results do not necessarily include payload/vectors by default.
- Use `with_payload` / `with_vectors` and projection (include/exclude fields) when you need them.
## Filtering model (boolean logic)
Filters are composed with:
- `must` (AND)
- `should` (OR)
- `must_not` (NOT)
Field conditions include:
- equality / IN / NOT IN (keyword/int/bool)
- numeric ranges
- datetime ranges (RFC 3339)
- geo filters
- array length (“values count”)
- empty/null semantics
- `has_id` and `has_vector`
### Nested arrays: correctness gotcha
If you filter arrays of objects and need multiple conditions to apply to the **same element**, use nested filtering patterns; otherwise you may accidentally match across different array elements.
## Explore (recommendation / discovery)
Use Explore when you need:
- recommendations from multiple positives and/or negatives
- discovery / context constrained search
- dataset exploration (e.g., outliers)
Operational notes:
- performance often scales with number of examples
- accuracy may require increasing `ef` for constrained discovery/context searches
## Hybrid and multi-stage retrieval
Qdrant supports multi-stage retrieval via `prefetch`:
- prefetch generates candidate sets
- the main query re-scores/ranks candidates
Important gotcha:
- `offset` applies only to the main query; ensure prefetch limits are large enough.
### Fusion patterns
When combining multiple channels (dense + sparse, or multiple embeddings):
- RRF (rank fusion) is a common default.
- Distribution-based score fusion (DBSF) can help when score scales differ.
### Diversity (MMR)
MMR helps reduce near-duplicate results; results may be ordered by selection process, not strictly by similarity score.
### Formula rescoring
Use formula-based rescoring to blend business signals (payload fields) with vector scores.
Rule of thumb: treat formula rescoring as a controlled, eval-driven feature (not a default).
## Practical rules of thumb
- Start simple: vector search + filter + payload projection.
- Add grouping/dedup only when needed (and index the group field).
- Add hybrid/multi-stage only when you can justify it with eval + latency budgets.
```
### references/indexing.md
```markdown
# Qdrant Indexing — overview (ingested: indexing concept page)
Source: https://qdrant.tech/documentation/concepts/indexing/
This note summarizes the indexing model and the handful of decisions that most teams actually need.
## Mental model (how to think about indexes)
- Qdrant combines **vector indexes** (for similarity search) with **payload indexes** (for filtering and query planning).
- Index configuration is applied at the **collection** level, but indexes may be built per-segment as data grows and optimizers decide it’s worthwhile.
## Payload indexes (what you need for fast filters)
- Payload indexes are created per field and type; they speed up filtering and help estimate filter selectivity.
- Index only fields you filter on frequently; indexes cost memory and build time.
- A practical heuristic from the guide: indexing fields with more distinct values often yields more benefit.
Supported payload index types mentioned include:
- keyword / integer / float / bool
- datetime
- uuid (doc notes this can be more memory-efficient than keyword for UUIDs)
- geo
- text (full-text)
### Parameterized integer index (performance trap)
- Integer indexes can be configured to support “lookup” (exact match) and/or “range”.
- The guide warns that enabling lookup in the wrong context can cause performance issues.
### On-disk payload indexes (memory vs latency)
- Default: payload-related structures are kept in memory for low latency.
- On-disk payload index exists for large/rarely used indexes to reduce memory pressure.
- Tradeoff: cold requests may be slower due to disk I/O.
### Tenant index / principal index (special-purpose)
- Tenant index: optimizes multi-tenant collections when most queries filter by tenant.
- Principal index: optimizes when most queries filter by a primary “timeline” field (e.g., timestamp).
## Full-text index (text filtering semantics)
- Full-text indexing enables token-based filtering on string payload.
- Key design choices:
- tokenizer (word/whitespace/prefix/multilingual)
- lowercasing / ASCII folding
- stemming / stopwords (language-specific)
- phrase matching (requires additional structure; enable explicitly)
Practical rule: text filter semantics depend on how you build the full-text index.
## Vector index (dense)
- The guide states dense vectors use an HNSW index.
- Parameters you’ll see:
- `m` (graph degree)
- `ef_construct` (build quality/speed)
- `ef` (search-time quality/latency)
- `full_scan_threshold` (when to skip HNSW)
Practical rule: don’t tune HNSW blindly — benchmark on your data.
## Sparse vector index
- Designed for sparse vectors (many zeros), conceptually closer to inverted-index style retrieval.
- Can be stored on disk to save memory, with expected latency tradeoffs.
- Supports dot-product similarity (as described in the guide).
## Filterable index / graph-filter interaction
- The guide describes additional mechanisms to keep graph traversal effective under filtering.
- Practical takeaway: the combination of vector search + filters has specific index support; strict multi-filter combinations may require special search algorithms.
## What to enforce in projects (portable)
- Treat payload indexes as mandatory for production filtered search.
- Prefer least number of indexed fields, chosen from actual query patterns.
- Decide early whether multi-tenancy will be “one collection per tenant” vs “shared collection + tenant index”.
- Document whether text filters require phrase semantics (and configure phrase matching accordingly).
```
### references/storage.md
```markdown
# Storage (Qdrant Concepts) — practical notes
Source: https://qdrant.tech/documentation/concepts/storage/
## Segment model (what to remember)
- A collection’s data is split into **segments**.
- Each segment has its own:
- vector storage
- payload storage
- vector + payload indexes
- ID mapper (internal ↔ external IDs)
- Segments usually do not overlap; if a point ends up in multiple segments, Qdrant has **deduplication** in search.
Appendable vs non-appendable:
- Segments can be **appendable** or **non-appendable** depending on storage/index choices.
- Appendable segments allow add/delete/query.
- Non-appendable segments allow read/delete only.
- A collection must have at least one appendable segment.
Why this matters operationally:
- Many performance behaviors (optimizer, indexing, memmap) are segment-scoped.
## Vector storage: In-memory vs Memmap (on-disk)
Qdrant provides two main vector storage modes:
- **In-memory**: vectors live in RAM; fastest for search; disk mostly used for persistence.
- **Memmap (on-disk)**: vectors live in memory-mapped files; OS page cache controls what is resident.
- With enough RAM, it can be close to in-memory performance.
- Typically preferred for large collections when RAM is limited and disks are fast.
### How to enable memmap
Two main approaches:
1) Collection creation: set `vectors.on_disk=true`.
- Recommended when you know upfront you want memmap for the whole collection.
2) Threshold-based conversion: set `memmap_threshold`.
- Can be configured globally and/or per collection.
- Segments above the threshold are converted to memmap storage.
Rule of thumb (from docs):
- Balanced workload: set `memmap_threshold` ≈ `indexing_threshold` (default mentioned as 20000 in docs).
- This helps avoid extra optimizer runs by aligning thresholds.
- High write load + low RAM: set `memmap_threshold` lower than `indexing_threshold` (e.g. 10000).
- Converts to memmap earlier; indexing happens later.
### HNSW index on disk
- You can also store the HNSW index on disk using `hnsw_config.on_disk=true` (per collection create/update).
Practical implication:
- “Vectors on disk” and “HNSW on disk” are separate knobs; decide per workload and disk speed.
## Payload storage: InMemory vs OnDisk
Payload storage types:
- **InMemory payload**: payload data loaded into RAM on startup; persistent backing on disk (and Gridstore per docs).
- Fast, but can consume a lot of RAM for large payload values (long text, images).
- **OnDisk payload**: payload read/write directly to RocksDB.
- Lower RAM usage, but higher access latency.
Critical performance rule:
- If you filter/search using payload conditions and payload is on disk, create **payload indexes** for the fields used in filters.
- Once a payload field is indexed, Qdrant keeps values of that indexed field in RAM **regardless** of payload storage type.
How to choose (practical):
- Large payload values that you don’t filter on → consider on-disk payload.
- Any payload fields used in filters/scoring → index them.
## Versioning + WAL (crash safety)
Qdrant uses a two-stage write path for integrity:
1) Write to **WAL** (write-ahead log): orders operations and assigns sequential numbers.
2) Apply changes to segments.
Each segment tracks:
- the last applied version
- per-point version
If an operation’s sequence number is older than the current point version, it is ignored.
Operational implication:
- WAL enables safe recovery after abnormal shutdown.
- Versioning prevents out-of-order updates from corrupting point state.
## Operational guidelines
- Prefer memmap vectors + (optional) on-disk HNSW when collections grow beyond RAM.
- Keep filter-critical payload fields indexed; avoid "disk payload + unindexed filters".
- Bulk ingestion workflows should align `memmap_threshold` and indexing thresholds.
```
### references/optimizer.md
```markdown
# Optimizer (Qdrant Concepts) — practical notes
Source: https://qdrant.tech/documentation/concepts/optimizer/
## Why optimizer exists (mental model)
Qdrant stores data in **segments**. Many changes are more efficient in **batches** than “in-place per point”, so Qdrant periodically **rebuilds** internal structures at segment level.
Key availability property:
- The segment being optimized remains **readable** during rebuild.
- Writes/updates during optimization go into a **copy-on-write** segment (proxy layer), which takes priority for reads and subsequent updates.
Practical implication:
- Optimization is expected background work. Plan for CPU/Disk IO spikes and don’t treat it as an outage.
## Vacuum optimizer (garbage of deleted points)
Deletion is logical first:
- Qdrant marks records as deleted and ignores them in queries.
- This minimizes disk IO, but over time deleted records accumulate → memory usage and performance can degrade.
Vacuum optimizer triggers when a segment accumulates “too many” deletions.
Relevant config knobs:
- `storage.optimizers.deleted_threshold`: minimal fraction of deleted vectors in a segment to start vacuum.
- `storage.optimizers.vacuum_min_vector_number`: minimal vectors in a segment before vacuum makes sense.
Operational guidance:
- If you do frequent deletes (e.g., reingestion, dedup), watch for vacuum activity and disk usage.
## Merge optimizer (too many small segments)
Too many small segments hurt search performance.
Merge optimizer tries to reduce segment count:
- Target segment count: `storage.optimizers.default_segment_number` (defaults to CPU count when 0).
- It merges (at least) the smallest segments.
- It avoids creating overly large segments via `storage.optimizers.max_segment_size_kb`.
Tradeoff note from docs:
- Lower `max_segment_size_kb` can prioritize faster indexation.
- Higher `max_segment_size_kb` can prioritize search speed (fewer segments), but risks long index build times per segment.
Practical guidance:
- Treat segment count as a performance lever: fewer segments typically helps search parallelism overhead, but “too large” segments make rebuilds expensive.
## Indexing optimizer (when to turn on indexes / memmap)
Qdrant can switch storage/index modes based on dataset size. Small datasets can be faster with brute-force scan.
Indexing optimizer enables:
- vector indexing
- memmap storage
…when thresholds are reached.
Relevant config knobs:
- `storage.optimizers.memmap_threshold` (kB per segment): above this, vectors become read-only **memmap**. Set to `0` to disable.
- `storage.optimizers.indexing_threshold_kb` (kB per segment): above this, enables vector indexing. Set to `0` to disable.
Practical implication:
- These thresholds strongly affect memory vs latency behavior; choose them intentionally for your workload.
## Per-collection optimizer overrides + dynamic tuning
In addition to global config, optimizer parameters can be set **per collection**.
Docs highlight a common production pattern:
- During bulk initial load, disable indexing / expensive rebuild behavior.
- After ingestion finishes, enable indexing so the index is built once (instead of rebuilding repeatedly during upload).
## Operational guidelines
- Collections can have different lifecycles (churny vs append-only).
- Bulk backfills / re-embeddings should use the "disable indexing during upload, re-enable after" pattern to save compute.
```
### references/ops-checklist.md
```markdown
# Operations Checklist (Monitoring, Performance, Troubleshooting)
Operational guidance for Qdrant: monitoring, performance tuning, and common issues.
---
## Monitoring
### Key Endpoints
| Endpoint | Purpose | Notes |
|----------|---------|-------|
| `/metrics` | Prometheus metrics | Scrape per node |
| `/telemetry` | State info (vectors, shards) | Debugging |
| `/healthz`, `/livez`, `/readyz` | Kubernetes health | Always accessible |
### Essential Metrics
**Collections**:
- `collections_total`, `collection_points`, `collection_vectors`
**API Performance**:
- `rest_responses_total/fail_total`
- `rest_responses_duration_seconds` (histogram)
**Memory**:
- `memory_allocated_bytes`, `memory_resident_bytes`
**Process**:
- `process_open_fds`, `process_threads`
**Cluster** (distributed):
- `cluster_peers_total`, `cluster_pending_operations_total`
**Optimizations**:
- `collection_running_optimizations`
### Configuration
- Prefix metrics: `QDRANT__SERVICE__METRICS_PREFIX`
- Hardware IO: `service.hardware_reporting: true`
---
## Performance Checklist
### Scenario 1: High-Speed Search, Low Memory
- Vectors `on_disk: true`
- Scalar quantization `int8` with `always_ram: true`
- Optional: `quantization.rescore: false` (slight precision loss)
### Scenario 2: High Precision, Low Memory
- Vectors and HNSW `on_disk: true`
- Increase HNSW: `m: 64`, `ef_construct: 512`
- Use inline storage (v1.16+) with quantization
- Check disk IOPS
### Scenario 3: High Precision, High-Speed
- Keep vectors in RAM
- Scalar quantization with rescoring
- Tune search: higher `hnsw_ef`, `exact: true` for ground truth
### General Tuning
| Goal | Setting |
|------|---------|
| Minimize latency | `default_segment_number` = CPU cores |
| Maximize throughput | `default_segment_number: 2`, `max_segment_size: 5M` |
### Checklist
- [ ] Index payload fields used in filters
- [ ] Choose quantization (scalar/binary) based on precision needs
- [ ] Monitor memory/disk via `/metrics`
- [ ] Adjust HNSW params (m, ef_construct, on_disk)
- [ ] Use named vectors for multi-modal
- [ ] Run optimizer after bulk inserts
---
## Troubleshooting
### Too many files open (OS error 24)
**Cause**: Each collection segment requires open files.
**Fix**:
```bash
# Docker
docker run --ulimit nofile=10000:10000 qdrant/qdrant
# Shell
ulimit -n 10000
```
### Incompatible file system (data corruption risk)
**Cause**: Qdrant requires POSIX-compatible filesystem; non-POSIX (FUSE, HFS+, WSL mounts) can corrupt data.
**Symptoms**:
- `OutputTooSmall { expected: 4, actual: 0 }`
- Vectors zeroed after restart
**Fix**: Use Docker named volumes instead of bind mounts to Windows folders (WSL issue).
### Can't open Collections meta Wal (distributed)
**Error**: `Resource temporarily unavailable`
**Cause**: WAL files locked by another Qdrant instance (shared storage).
**Fix**: Each node must have its own storage directory. Cluster handles data sharing internally.
### gRPC + Multiprocessing Socket Error
**Error**: `sendmsg: Socket operation on non-socket (88)`
**Fix**:
```python
import multiprocessing
multiprocessing.set_start_method("forkserver") # or "spawn"
```
Or use REST API / async client.
---
## Quick Fixes Summary
| Issue | Fix |
|-------|-----|
| File limit errors | `--ulimit nofile=10000:10000` |
| Data corruption on WSL | Use Docker named volumes |
| Slow filtered search | Index payload fields |
| High memory usage | Enable `on_disk` for vectors/HNSW |
| Low recall | Increase `hnsw_ef`, `ef_construct` |
```
### references/configuration.md
```markdown
# Qdrant Configuration — overview (ingested: configuration guide)
Source: https://qdrant.tech/documentation/guides/configuration/
This note summarizes the **Configuration guide** with an emphasis on patterns you’ll actually use in real deployments.
## How configuration is supplied (practical)
- Qdrant supports file-based configuration and environment variable overrides.
- File formats mentioned: YAML (commonly used), and also TOML/JSON/INI.
- Environment variables have the **highest priority**.
### Env var mapping pattern (high value)
- Prefix: `QDRANT__`
- Nested keys are separated by double underscores.
- Example: `QDRANT__SERVICE__API_KEY=...`
## Precedence / load order (high value)
The guide describes a layered override model (least → most significant):
1) embedded defaults
2) `config/config.yaml`
3) `config/{RUN_MODE}.yaml`
4) `config/local.yaml`
5) explicit `--config-path` file (overrides other files)
6) environment variables (override everything)
Practical pattern:
- keep stable defaults in file-based config
- keep secrets and env-specific overrides in env vars or the orchestrator (Kubernetes)
## Settings that matter most in production
### Networking / service
- HTTP: typically `6333`; gRPC: typically `6334`.
- gRPC can be disabled if configured accordingly.
- Bind address/host is configurable.
### Security
- API keys and read-only API keys are config-driven.
- TLS can be enabled at the service level; optional mutual TLS is supported.
- TLS cert rotation can be handled via periodic reload (`tls.cert_ttl`).
### Storage / snapshots
- storage path and snapshots path are explicit configuration.
- snapshots can be stored locally or in S3 (requires S3 config).
- WAL has tunables (capacity/segments) that matter under write load.
### Performance (avoid premature tuning)
- Search/indexing thread controls exist; defaults are usually fine until measured.
- HNSW/index parameters are configurable; only tune with benchmarks.
### Distributed cluster
- Cluster enablement and peer-to-peer settings are configurable.
- Peer TLS can be enabled.
- Transfer limits and shard transfer methods can be configured.
## Operational recommendations (portable)
- Treat config as an explicit artifact: commit non-secret defaults, inject secrets at deploy time.
- Prefer “small number of well-understood knobs” over changing many settings without measurement.
- Validate a restart path: invalid config should fail fast at startup.
```
### references/security.md
```markdown
# Qdrant Security — overview (ingested: security guide)
Source: https://qdrant.tech/documentation/guides/security/
This note summarizes the **Security guide** with an emphasis on actionable, production-relevant controls.
## Baseline warning (high value)
- Qdrant instances can be **unsecured by default**. Do not expose a node to untrusted networks without adding security controls.
## Network model & threat surfaces
- Qdrant exposes REST and gRPC APIs and can also run in distributed mode.
- In distributed mode, there is an **internal cluster port** (not meant for public exposure). The guide highlights that **internal channels are not protected by API keys/bearer tokens**, so network isolation is mandatory.
- Practical implication: treat the cluster network as a trusted private network segment.
## Authentication options
### Static API key
- Intended as a straightforward gate for API access.
- Provided via config (`service.api_key`) or env (`QDRANT__SERVICE__API_KEY`).
- Clients send it via an `api-key` header.
- Security guide stresses: use **TLS** to avoid leaking the API key.
### Read-only API key
- Separate key for read-only access.
- Config: `service.read_only_api_key` or env: `QDRANT__SERVICE__READ_ONLY_API_KEY`.
### JWT-based RBAC
- Provides finer-grained authorization (including per-collection access).
- Enabled via `service.jwt_rbac: true` and an API key used for signing/verifying tokens.
- Operationally important: anyone who knows the signing key can generate tokens offline; key rotation invalidates existing tokens.
## TLS
- The guide describes enabling TLS for REST/gRPC and (optionally) inter-node cluster connections.
- Operational notes:
- certificate rotation is supported by periodic reload (tunable via `tls.cert_ttl`).
- you can also terminate TLS via a reverse proxy, but still must isolate internal cluster ports.
## Hardening patterns (practical)
- Run as non-root (unprivileged image or explicit user IDs).
- Make container root filesystem read-only where feasible; mount persistent storage separately.
- Network isolation:
- Docker internal networks for “no public ingress/egress” patterns.
- Kubernetes NetworkPolicy to restrict ingress/egress (while allowing required inter-node traffic).
## Concrete guidance worth enforcing in projects
- Always keep internal cluster traffic private (never expose the internal cluster port publicly).
- If using API keys/JWT, do not run without TLS unless you have a trusted, private network boundary.
- Prefer least privilege (read-only key or collection-scoped JWT) for read-heavy workloads.
```
### references/snapshots.md
```markdown
# Snapshots (Qdrant Concepts) — practical notes
Source: https://qdrant.tech/documentation/concepts/snapshots/
## What a snapshot is (and what it is not)
- A snapshot is a **tar archive** containing the **data + collection configuration** for a specific collection **on a specific node** at a specific time.
- In a **distributed** deployment, you must create snapshots **per node** for the same collection (each node only has its local shard data).
- Collection-level snapshots **do not include aliases**; handle aliases separately.
- Qdrant Cloud has “Backups” as a disk-level alternative; snapshots are still useful for OSS/self-hosted workflows.
## Collection snapshots: create / list / delete / download
Core endpoints:
- Create: `POST /collections/{collection_name}/snapshots` (synchronous; generates a `.snapshot` file in `snapshots_path`).
- List: `GET /collections/{collection_name}/snapshots`
- Delete: `DELETE /collections/{collection_name}/snapshots/{snapshot_name}`
- Download: `GET /collections/{collection_name}/snapshots/{snapshot_name}` (REST-only per docs).
Practical implications:
- Treat snapshot creation as an **IO-heavy operation**; plan disk space and timing.
- In a cluster, coordinate per-node snapshot creation if you need a consistent point-in-time capture.
## Restore constraints (version + topology)
- A snapshot can only be restored into a cluster that shares the **same minor version**.
- Example from docs: `v1.4.1` → `v1.4.x` with `x >= 1`.
## Restore methods (and when to use which)
Qdrant supports three restoration paths:
1) **Recover from URL or local file** (`PUT /collections/{collection_name}/snapshots/recover`)
- `location` can be:
- an HTTP(S) URL reachable from the restoring node, or
- a `file:///...` URI to a local snapshot file.
- If the target collection does not exist, Qdrant will create it.
- Cloud note: restoring from a URL is not supported if outbound traffic is blocked; use file URI or upload.
2) **Recover from uploaded snapshot** (`POST /collections/{collection_name}/snapshots/upload?priority=...`)
- Upload snapshot bytes as multipart; recommended for migrations.
- Consider setting `priority=snapshot` for migration use-cases.
3) **Recover during start-up** (Qdrant CLI flags)
- Single-node only (not multi-node, not Cloud).
- Start Qdrant with repeated `--snapshot <path>:<target_collection>` pairs.
- The target collection must be **absent**, otherwise Qdrant exits with an error.
- `--force_snapshot` overwrites existing collections; treat as a dangerous operation.
## Snapshot recovery priority (critical gotcha)
When restoring onto a non-empty node, conflicts are resolved by `priority`:
- `replica` (default): prefer existing data over snapshot.
- `snapshot`: prefer snapshot over existing data.
- `no_sync`: restore without extra synchronization (advanced; easy to break the cluster).
Important gotcha:
- To recover a **new collection** from a snapshot, you typically need `priority=snapshot`.
- With the default `replica` priority, the docs note you can end up with an **empty collection** if the system prefers the “existing” (empty) state.
## Full storage snapshots (single-node only)
- Full storage snapshots capture **whole storage**, including **collection aliases**.
- They are **not suitable for distributed mode**.
- They can be created/downloaded in Cloud, but Cloud cannot be restored from a full storage snapshot because that requires the CLI.
Endpoints:
- Create: `POST /snapshots`
- List: `GET /snapshots`
- Delete: `DELETE /snapshots/{snapshot_name}`
- Download: `GET /snapshots/{snapshot_name}` (REST-only per docs)
Restore:
- CLI at startup: `./qdrant --storage-snapshot /path/to/full.snapshot`
## Snapshot storage configuration (paths, temp, S3)
Local filesystem defaults:
- Default snapshot dir: `./snapshots` (or `/qdrant/snapshots` inside the Docker image).
Config knobs:
- `storage.snapshots_path` (env: `QDRANT__STORAGE__SNAPSHOTS_PATH`)
- `storage.temp_path` (optional separate temp dir for snapshot creation; useful if the storage disk is slow or space-constrained)
S3 support (S3-compatible):
- Configure `storage.snapshots_config` with `snapshots_storage: s3` and `s3_config` (bucket/region/access_key/secret_key/endpoint_url).
## Operational guidelines
- For multi-tenant setups (one collection per tenant), snapshots are naturally scoped per collection.
- Choose between collection-level snapshot (per-collection backup/restore) vs full storage snapshot (single-node only).
- For self-hosted clusters, plan per-node snapshot creation/restore behavior.
```