SkillHub ClubAnalyze Data & AIFull StackBackendData / AI

using-graph-databases

Graph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

318

Hot score

Updated

March 20, 2026

Overall rating

C4.5

Composite score

4.5

Best-practice grade

B75.6

Install command

npx @skill-hub/cli install ancoleman-ai-design-components-using-graph-databases

Repository

ancoleman/ai-design-components

Skill path: skills/using-graph-databases

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Backend, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: ancoleman.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install using-graph-databases into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/ancoleman/ai-design-components before adding using-graph-databases to shared team environments
Use using-graph-databases for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: using-graph-databases
description: Graph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.
---

# Graph Databases

## Purpose

This skill guides selection and implementation of graph databases for applications where relationships between entities are first-class citizens. Unlike relational databases that model relationships through foreign keys and joins, graph databases natively represent connections as properties, enabling efficient traversal-heavy queries.

## When to Use This Skill

Use graph databases when:
- **Deep relationship traversals** (4+ hops): "Friends of friends of friends"
- **Variable/evolving relationships**: Schema changes don't break existing queries
- **Path finding**: Shortest route, network analysis, dependency chains
- **Pattern matching**: Fraud detection, recommendation engines, access control

**Do NOT use graph databases when**:
- Fixed schema with shallow joins (2-3 tables) → Use PostgreSQL
- Primarily aggregations/analytics → Use columnar databases
- Key-value lookups only → Use Redis/DynamoDB

## Quick Decision Framework

```
DATA CHARACTERISTICS?
├── Fixed schema, shallow joins (≤3 hops)
│   └─ PostgreSQL (relational)
│
├── Already on PostgreSQL + simple graphs
│   └─ Apache AGE (PostgreSQL extension)
│
├── Deep traversals (4+ hops) + general purpose
│   └─ Neo4j (battle-tested, largest ecosystem)
│
├── Multi-model (documents + graph)
│   └─ ArangoDB
│
├── AWS-native, serverless
│   └─ Amazon Neptune
│
└── Real-time streaming, in-memory
    └─ Memgraph
```

## Core Concepts

### Property Graph Model

Graph databases store data as:
- **Nodes** (vertices): Entities with labels and properties
- **Relationships** (edges): Typed connections with properties
- **Properties**: Key-value pairs on nodes and relationships

```
(Person {name: "Alice", age: 28})-[:FRIEND {since: "2020-01-15"}]->(Person {name: "Bob"})
```

### Query Languages

| Language | Databases | Readability | Best For |
|----------|-----------|-------------|----------|
| **Cypher** | Neo4j, Memgraph, AGE | ⭐⭐⭐⭐⭐ SQL-like | General purpose |
| **Gremlin** | Neptune, JanusGraph | ⭐⭐⭐ Functional | Cross-database |
| **AQL** | ArangoDB | ⭐⭐⭐⭐ SQL-like | Multi-model |
| **SPARQL** | Neptune, RDF stores | ⭐⭐⭐ W3C standard | Semantic web |

## Common Cypher Patterns

Reference `references/cypher-patterns.md` for comprehensive examples.

### Pattern 1: Basic Matching
```cypher
// Find all users at a company
MATCH (u:User)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN u.name, u.title
```

### Pattern 2: Variable-Length Paths
```cypher
// Find friends up to 3 degrees away
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(friend)
WHERE u <> friend
RETURN DISTINCT friend.name
LIMIT 100
```

### Pattern 3: Shortest Path
```cypher
// Find shortest connection between two users
MATCH path = shortestPath(
  (a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path, length(path) AS distance
```

### Pattern 4: Recommendations
```cypher
// Collaborative filtering: Products liked by similar users
MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar)
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(rec))
RETURN rec.name, count(*) AS score
ORDER BY score DESC
LIMIT 10
```

### Pattern 5: Fraud Detection
```cypher
// Detect circular money flows
MATCH path = (a:Account)-[:SENT*3..6]->(a)
WHERE all(r IN relationships(path) WHERE r.amount > 1000)
RETURN path, [r IN relationships(path) | r.amount] AS amounts
```

## Database Selection Guide

### Neo4j (Primary Recommendation)

**Use for**: General-purpose graph applications

**Strengths**:
- Most mature (2007), largest community (2M+ developers)
- 65+ graph algorithms (GDS library): PageRank, Louvain, Dijkstra
- Best tooling: Neo4j Browser, Bloom visualization
- Comprehensive Cypher support

**Installation**:
```bash
# Python driver
pip install neo4j

# TypeScript driver
npm install neo4j-driver

# Rust driver
cargo add neo4rs
```

Reference: `references/neo4j.md`

### ArangoDB

**Use for**: Multi-model applications (documents + graph)

**Strengths**:
- Store documents AND graph in one database
- AQL combines document and graph queries
- Schema flexibility with relationships

Reference: `references/arangodb.md`

### Apache AGE

**Use for**: Adding graph capabilities to existing PostgreSQL

**Strengths**:
- Extend PostgreSQL with graph queries
- No new infrastructure needed
- Query both relational and graph data

Reference: Implementation details in examples/

### Amazon Neptune

**Use for**: AWS-native, serverless deployments

**Strengths**:
- Fully managed, auto-scaling
- Supports Gremlin AND SPARQL
- AWS ecosystem integration

## Graph Data Modeling Patterns

Reference `references/graph-modeling.md` for comprehensive patterns.

### Best Practice 1: Relationships as First-Class Citizens

**Anti-pattern** (storing relationships in node properties):
```cypher
// BAD
(:Person {name: 'Alice', friend_ids: ['b123', 'c456']})
```

**Pattern** (explicit relationships):
```cypher
// GOOD
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'b123'})
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'c456'})
```

### Best Practice 2: Relationship Properties for Metadata

```cypher
// Track interaction details on relationships
(:Person)-[:FRIEND {
  since: '2020-01-15',
  strength: 0.85,
  last_interaction: datetime()
}]->(:Person)
```

### Best Practice 3: Bounded Traversals for Performance

```cypher
// SLOW: Unbounded traversal
MATCH (a)-[:FRIEND*]->(distant)
RETURN distant

// FAST: Bounded depth with index
MATCH (a)-[:FRIEND*1..4]->(distant)
WHERE distant.active = true
RETURN distant
LIMIT 100
```

### Best Practice 4: Avoid Supernodes

**Problem**: Nodes with thousands of relationships slow traversals.

**Solution**: Intermediate aggregation nodes
```cypher
// Instead of: (:User)-[:POSTED]->(:Post) [1M relationships]

// Use time partitioning:
(:User)-[:POSTED_IN]->(:Year {year: 2025})
       -[:HAS_MONTH]->(:Month {month: 12})
       -[:HAS_POST]->(:Post)
```

## Use Case Examples

### Social Network

Schema and implementation in `examples/social-graph/`

**Key features**:
- Friend recommendations (friends-of-friends)
- Mutual connections
- News feed generation
- Influence metrics

### Knowledge Graph for AI/RAG

Integration example in `examples/knowledge-graph/`

**Key features**:
- Hybrid vector + graph search
- Entity relationship mapping
- Context expansion for LLM prompts
- Semantic relationship traversal

**Integration with Vector Databases**:
```python
# Step 1: Vector search in Qdrant/pgvector
vector_results = qdrant.search(collection="concepts", query_vector=embedding)

# Step 2: Expand with graph relationships
concept_ids = [r.id for r in vector_results]
graph_context = neo4j.run("""
  MATCH (c:Concept) WHERE c.id IN $ids
  MATCH (c)-[:RELATED_TO|IS_A*1..2]-(related)
  RETURN c, related, relationships(path)
""", ids=concept_ids)
```

### Recommendation Engine

Examples in `examples/social-graph/`

**Strategies**:
1. **Collaborative filtering**: "Users who bought X also bought Y"
2. **Content-based**: "Products similar to what you like"
3. **Session-based**: "Recently viewed items"

### Fraud Detection

Pattern detection in examples/

**Detection patterns**:
- Circular money flows
- Shared devices across accounts
- Rapid transaction chains
- Connection pattern anomalies

## Performance Optimization

Reference `references/cypher-patterns.md` for detailed optimization.

### Indexing
```cypher
// Single-property index
CREATE INDEX user_email FOR (u:User) ON (u.email)

// Composite index (Neo4j 5.x+)
CREATE INDEX user_name_location FOR (u:User) ON (u.name, u.location)

// Full-text search
CREATE FULLTEXT INDEX product_search FOR (p:Product) ON EACH [p.name, p.description]
```

### Caching Expensive Aggregations
```cypher
// Materialize friend count as property
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS friendCount
SET u.friend_count = friendCount

// Query becomes instant
MATCH (u:User) WHERE u.friend_count > 100
RETURN u.name, u.friend_count
```

### Scaling Strategies

| Scale | Strategy | Implementation |
|-------|----------|----------------|
| **Vertical** | Add RAM/CPU | In-memory caching, larger instances |
| **Horizontal (Read)** | Read replicas | Neo4j Cluster, ArangoDB Cluster |
| **Horizontal (Write)** | Sharding | ArangoDB SmartGraphs, JanusGraph |
| **Caching** | App-level cache | Redis for hot paths |

## Language Integration

### Python (Neo4j)

Complete example in `examples/social-graph/python-neo4j/`

```python
from neo4j import GraphDatabase

class GraphDB:
    def __init__(self, uri: str, user: str, password: str):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def find_friends_of_friends(self, user_id: str, max_depth: int = 2):
        query = """
        MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
        WHERE u <> fof
        RETURN DISTINCT fof.id, fof.name
        LIMIT 100
        """
        with self.driver.session() as session:
            result = session.run(query, userId=user_id, maxDepth=max_depth)
            return [dict(record) for record in result]

# Usage
db = GraphDB("bolt://localhost:7687", "neo4j", "password")
friends = db.find_friends_of_friends("u123", max_depth=3)
```

### TypeScript (Neo4j)

Complete example in `examples/social-graph/typescript-neo4j/`

```typescript
import neo4j, { Driver } from 'neo4j-driver'

class Neo4jService {
  private driver: Driver

  constructor(uri: string, username: string, password: string) {
    this.driver = neo4j.driver(uri, neo4j.auth.basic(username, password))
  }

  async findFriendsOfFriends(userId: string, maxDepth: number = 2) {
    const session = this.driver.session()
    try {
      const result = await session.run(
        `MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
         WHERE u <> fof
         RETURN DISTINCT fof.id, fof.name
         LIMIT 100`,
        { userId, maxDepth }
      )
      return result.records.map(r => r.toObject())
    } finally {
      await session.close()
    }
  }
}
```

### Go (ArangoDB)

```go
import (
    "github.com/arangodb/go-driver"
    "github.com/arangodb/go-driver/http"
)

func findFriendsOfFriends(db driver.Database, userId string, maxDepth int) ([]User, error) {
    query := `
        FOR vertex, edge, path IN 1..@maxDepth OUTBOUND @startVertex GRAPH 'socialGraph'
            FILTER vertex._id != @startVertex
            RETURN DISTINCT vertex
            LIMIT 100
    `

    cursor, err := db.Query(ctx, query, map[string]interface{}{
        "startVertex": userId,
        "maxDepth": maxDepth,
    })

    // Handle results...
}
```

## Schema Validation

Use `scripts/validate_graph_schema.py` to check for:
- Unbounded traversals (missing depth limits)
- Missing indexes on frequently queried properties
- Supernodes (nodes with excessive relationships)
- Relationship property consistency

Run validation:
```bash
python scripts/validate_graph_schema.py --database neo4j://localhost:7687
```

## Integration with Other Skills

### With databases-vector (Hybrid Search)
Combine vector similarity with graph context for AI/RAG applications.
See `examples/knowledge-graph/`

### With search-filter
Implement relationship-based queries: "Find all users within 3 degrees of connection"

### With ai-chat
Use knowledge graphs to enrich LLM context with structured relationships.

### With auth-security (ReBAC)
Implement relationship-based access control: "Can user X access resource Y through relation Z?"

## Common Schema Patterns

### Star Schema (Hub and Spokes)
```cypher
(:User)-[:PURCHASED]->(:Product)
(:User)-[:VIEWED]->(:Product)
(:User)-[:RATED]->(:Product)
```

### Hierarchical Schema (Trees)
```cypher
(:CEO)-[:MANAGES]->(:VP)-[:MANAGES]->(:Director)
```

### Temporal Schema (Event Sequences)
```cypher
(:Event {timestamp})-[:NEXT]->(:Event {timestamp})
```

## Getting Started

1. **Choose database**: Use decision framework above
2. **Design schema**: Reference `references/graph-modeling.md`
3. **Implement queries**: Use patterns from `references/cypher-patterns.md`
4. **Validate**: Run `scripts/validate_graph_schema.py`
5. **Optimize**: Add indexes, bound traversals, cache aggregations

## Further Reading

- `references/neo4j.md` - Neo4j setup, drivers, GDS algorithms
- `references/arangodb.md` - ArangoDB multi-model patterns
- `references/cypher-patterns.md` - Comprehensive Cypher query library
- `references/graph-modeling.md` - Data modeling best practices
- `examples/social-graph/` - Complete social network implementation
- `examples/knowledge-graph/` - Hybrid vector + graph for AI/RAG


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/validate_graph_schema.py

```python
#!/usr/bin/env python3
"""
Graph Schema Validation Script

Validates Neo4j graph schemas for common anti-patterns and performance issues:
1. Unbounded traversals (missing depth limits)
2. Missing indexes on frequently queried properties
3. Supernodes (nodes with excessive relationships)
4. Relationship property consistency
5. Constraint violations
"""

import argparse
import sys
from neo4j import GraphDatabase
from typing import List, Dict, Any
from dataclasses import dataclass
from enum import Enum


class Severity(Enum):
    """Issue severity levels."""
    INFO = "INFO"
    WARNING = "WARNING"
    ERROR = "ERROR"
    CRITICAL = "CRITICAL"


@dataclass
class ValidationIssue:
    """Represents a validation issue found in the schema."""
    severity: Severity
    category: str
    message: str
    details: Dict[str, Any]
    recommendation: str


class GraphSchemaValidator:
    """Validates Neo4j graph schemas for best practices and performance."""

    def __init__(self, uri: str, user: str, password: str):
        """
        Initialize validator with Neo4j connection.

        Args:
            uri: Neo4j connection URI (e.g., 'bolt://localhost:7687')
            user: Database username
            password: Database password
        """
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
        self.issues: List[ValidationIssue] = []

    def close(self):
        """Close database connection."""
        self.driver.close()

    def add_issue(
        self,
        severity: Severity,
        category: str,
        message: str,
        details: Dict[str, Any],
        recommendation: str
    ):
        """Add a validation issue to the results."""
        self.issues.append(ValidationIssue(
            severity=severity,
            category=category,
            message=message,
            details=details,
            recommendation=recommendation
        ))

    def check_supernodes(self, threshold: int = 10000):
        """
        Check for supernodes (nodes with too many relationships).

        Supernodes slow down graph traversals significantly.

        Args:
            threshold: Max relationships per node before flagging
        """
        query = """
        MATCH (n)
        WITH n, size((n)--()) AS degree
        WHERE degree > $threshold
        RETURN labels(n) AS labels, n, degree
        ORDER BY degree DESC
        LIMIT 50
        """

        with self.driver.session() as session:
            result = session.run(query, threshold=threshold)
            supernodes = list(result)

            if supernodes:
                for record in supernodes:
                    labels = record['labels']
                    degree = record['degree']
                    node_props = dict(record['n'])

                    self.add_issue(
                        severity=Severity.WARNING if degree < 50000 else Severity.ERROR,
                        category="Performance",
                        message=f"Supernode detected: {labels} node with {degree:,} relationships",
                        details={
                            'labels': labels,
                            'degree': degree,
                            'properties': node_props
                        },
                        recommendation=(
                            "Consider partitioning relationships using intermediate nodes. "
                            "For example, use time-based partitioning: "
                            "(Node)-[:REL_IN]->(Year)-[:HAS_MONTH]->(Month)-[:CONTAINS]->(Target)"
                        )
                    )

    def check_indexes(self):
        """
        Check for missing indexes on frequently queried properties.

        Analyzes node labels and suggests indexes.
        """
        # Get all indexes
        with self.driver.session() as session:
            result = session.run("SHOW INDEXES")
            existing_indexes = set()

            for record in result:
                if record.get('labelsOrTypes') and record.get('properties'):
                    label = record['labelsOrTypes'][0] if record['labelsOrTypes'] else None
                    props = record['properties']
                    if label and props:
                        existing_indexes.add((label, tuple(props)))

            # Get node label statistics
            result = session.run("""
                CALL db.labels() YIELD label
                CALL {
                    WITH label
                    MATCH (n)
                    WHERE label IN labels(n)
                    RETURN count(n) AS count
                    LIMIT 1
                }
                RETURN label, count
                ORDER BY count DESC
            """)

            label_counts = {record['label']: record['count'] for record in result}

            # Check for common properties that should be indexed
            for label, count in label_counts.items():
                if count > 100:  # Only check labels with significant data
                    # Get property keys for this label
                    prop_result = session.run(f"""
                        MATCH (n:{label})
                        RETURN DISTINCT keys(n) AS props
                        LIMIT 1
                    """)

                    props_record = prop_result.single()
                    if props_record and props_record['props']:
                        props = props_record['props']

                        # Check for common filterable properties
                        filterable = ['id', 'email', 'name', 'created_at', 'date', 'timestamp']
                        for prop in props:
                            if prop in filterable:
                                if (label, (prop,)) not in existing_indexes:
                                    self.add_issue(
                                        severity=Severity.WARNING,
                                        category="Indexing",
                                        message=f"Missing index on {label}.{prop}",
                                        details={
                                            'label': label,
                                            'property': prop,
                                            'node_count': count
                                        },
                                        recommendation=(
                                            f"CREATE INDEX {label.lower()}_{prop} "
                                            f"FOR (n:{label}) ON (n.{prop})"
                                        )
                                    )

    def check_constraints(self):
        """
        Check for recommended constraints.

        Ensures data integrity through constraints on IDs and emails.
        """
        with self.driver.session() as session:
            # Get existing constraints
            result = session.run("SHOW CONSTRAINTS")
            existing_constraints = set()

            for record in result:
                if record.get('labelsOrTypes') and record.get('properties'):
                    label = record['labelsOrTypes'][0] if record['labelsOrTypes'] else None
                    props = record['properties']
                    constraint_type = record.get('type', '')
                    if label and props:
                        existing_constraints.add((label, tuple(props), constraint_type))

            # Get node labels
            result = session.run("CALL db.labels()")
            labels = [record['label'] for record in result]

            # Check for recommended unique constraints
            for label in labels:
                # Check for id uniqueness
                if not any(c[0] == label and 'id' in c[1] and 'UNIQUENESS' in c[2]
                          for c in existing_constraints):
                    # Check if id property exists
                    check = session.run(f"""
                        MATCH (n:{label})
                        WHERE n.id IS NOT NULL
                        RETURN count(n) AS count
                        LIMIT 1
                    """)
                    count_record = check.single()
                    if count_record and count_record['count'] > 0:
                        self.add_issue(
                            severity=Severity.WARNING,
                            category="Data Integrity",
                            message=f"Missing unique constraint on {label}.id",
                            details={'label': label, 'property': 'id'},
                            recommendation=(
                                f"CREATE CONSTRAINT {label.lower()}_id_unique "
                                f"FOR (n:{label}) REQUIRE n.id IS UNIQUE"
                            )
                        )

                # Check for email uniqueness
                if 'User' in label or 'Person' in label:
                    if not any(c[0] == label and 'email' in c[1] and 'UNIQUENESS' in c[2]
                              for c in existing_constraints):
                        check = session.run(f"""
                            MATCH (n:{label})
                            WHERE n.email IS NOT NULL
                            RETURN count(n) AS count
                            LIMIT 1
                        """)
                        count_record = check.single()
                        if count_record and count_record['count'] > 0:
                            self.add_issue(
                                severity=Severity.WARNING,
                                category="Data Integrity",
                                message=f"Missing unique constraint on {label}.email",
                                details={'label': label, 'property': 'email'},
                                recommendation=(
                                    f"CREATE CONSTRAINT {label.lower()}_email_unique "
                                    f"FOR (n:{label}) REQUIRE n.email IS UNIQUE"
                                )
                            )

    def check_orphaned_nodes(self):
        """
        Check for orphaned nodes (nodes with no relationships).

        Large numbers of orphaned nodes may indicate data quality issues.
        """
        query = """
        MATCH (n)
        WHERE NOT (n)--()
        WITH labels(n) AS labels, count(n) AS count
        RETURN labels, count
        ORDER BY count DESC
        """

        with self.driver.session() as session:
            result = session.run(query)
            orphaned = list(result)

            for record in orphaned:
                if record['count'] > 10:  # Flag if more than 10 orphaned nodes
                    self.add_issue(
                        severity=Severity.INFO,
                        category="Data Quality",
                        message=f"{record['count']} orphaned nodes found: {record['labels']}",
                        details={
                            'labels': record['labels'],
                            'count': record['count']
                        },
                        recommendation=(
                            "Review if these nodes should be connected or removed. "
                            "Orphaned nodes consume storage without providing graph value."
                        )
                    )

    def check_relationship_properties(self):
        """
        Check for inconsistent relationship properties.

        Ensures relationships of the same type have consistent properties.
        """
        query = """
        MATCH ()-[r]->()
        WITH type(r) AS rel_type, keys(r) AS props
        WITH rel_type, collect(DISTINCT props) AS prop_sets
        WHERE size(prop_sets) > 1
        RETURN rel_type, prop_sets
        """

        with self.driver.session() as session:
            result = session.run(query)
            inconsistencies = list(result)

            for record in inconsistencies:
                self.add_issue(
                    severity=Severity.WARNING,
                    category="Data Quality",
                    message=f"Inconsistent properties on {record['rel_type']} relationships",
                    details={
                        'relationship_type': record['rel_type'],
                        'property_sets': record['prop_sets']
                    },
                    recommendation=(
                        "Standardize relationship properties. All relationships of the same "
                        "type should have consistent property schemas."
                    )
                )

    def check_database_stats(self):
        """
        Display database statistics for context.
        """
        query = """
        MATCH (n)
        WITH count(n) AS node_count
        MATCH ()-[r]->()
        WITH node_count, count(r) AS rel_count
        CALL db.labels() YIELD label
        WITH node_count, rel_count, collect(label) AS labels
        CALL db.relationshipTypes() YIELD relationshipType
        RETURN
            node_count,
            rel_count,
            size(labels) AS label_count,
            collect(relationshipType) AS rel_types
        """

        with self.driver.session() as session:
            result = session.run(query)
            stats = result.single()

            print("\n" + "="*60)
            print("DATABASE STATISTICS")
            print("="*60)
            print(f"Total Nodes: {stats['node_count']:,}")
            print(f"Total Relationships: {stats['rel_count']:,}")
            print(f"Node Labels: {stats['label_count']}")
            print(f"Relationship Types: {len(stats['rel_types'])}")
            print("="*60 + "\n")

    def validate(self):
        """
        Run all validation checks.

        Returns:
            Number of issues found
        """
        print("Starting graph schema validation...")

        self.check_database_stats()
        self.check_supernodes()
        self.check_indexes()
        self.check_constraints()
        self.check_orphaned_nodes()
        self.check_relationship_properties()

        return len(self.issues)

    def print_report(self):
        """Print validation report."""
        if not self.issues:
            print("\n✅ No issues found! Schema looks good.\n")
            return

        # Group by severity
        by_severity = {
            Severity.CRITICAL: [],
            Severity.ERROR: [],
            Severity.WARNING: [],
            Severity.INFO: []
        }

        for issue in self.issues:
            by_severity[issue.severity].append(issue)

        # Print summary
        print("\n" + "="*60)
        print("VALIDATION SUMMARY")
        print("="*60)
        print(f"Critical: {len(by_severity[Severity.CRITICAL])}")
        print(f"Errors:   {len(by_severity[Severity.ERROR])}")
        print(f"Warnings: {len(by_severity[Severity.WARNING])}")
        print(f"Info:     {len(by_severity[Severity.INFO])}")
        print("="*60 + "\n")

        # Print issues by severity
        for severity in [Severity.CRITICAL, Severity.ERROR, Severity.WARNING, Severity.INFO]:
            issues = by_severity[severity]
            if issues:
                print(f"\n{severity.value} ({len(issues)} issues)")
                print("-" * 60)

                for i, issue in enumerate(issues, 1):
                    print(f"\n{i}. [{issue.category}] {issue.message}")
                    if issue.recommendation:
                        print(f"   💡 Recommendation: {issue.recommendation}")


def main():
    """Main entry point."""
    parser = argparse.ArgumentParser(
        description="Validate Neo4j graph schema for best practices"
    )
    parser.add_argument(
        '--uri',
        default='bolt://localhost:7687',
        help='Neo4j connection URI (default: bolt://localhost:7687)'
    )
    parser.add_argument(
        '--user',
        default='neo4j',
        help='Neo4j username (default: neo4j)'
    )
    parser.add_argument(
        '--password',
        default='password',
        help='Neo4j password (default: password)'
    )
    parser.add_argument(
        '--supernode-threshold',
        type=int,
        default=10000,
        help='Relationship count threshold for supernode detection (default: 10000)'
    )

    args = parser.parse_args()

    validator = GraphSchemaValidator(args.uri, args.user, args.password)

    try:
        issue_count = validator.validate()
        validator.print_report()

        # Exit with error code if critical/error issues found
        if any(i.severity in [Severity.CRITICAL, Severity.ERROR] for i in validator.issues):
            sys.exit(1)
        else:
            sys.exit(0)

    except Exception as e:
        print(f"\n❌ Validation failed: {e}", file=sys.stderr)
        sys.exit(2)
    finally:
        validator.close()


if __name__ == "__main__":
    main()

```

### references/cypher-patterns.md

```markdown
# Cypher Query Patterns Reference

Comprehensive collection of common Cypher query patterns for Neo4j, Memgraph, and Apache AGE.


## Table of Contents

- [Pattern Matching Basics](#pattern-matching-basics)
  - [Simple Pattern Matching](#simple-pattern-matching)
  - [Relationship Patterns](#relationship-patterns)
  - [Relationship Properties](#relationship-properties)
- [Variable-Length Paths](#variable-length-paths)
  - [Fixed-Depth Traversal](#fixed-depth-traversal)
  - [Variable-Depth Traversal](#variable-depth-traversal)
- [Path Finding](#path-finding)
  - [Shortest Path](#shortest-path)
  - [All Shortest Paths](#all-shortest-paths)
  - [Weighted Shortest Path (with GDS)](#weighted-shortest-path-with-gds)
- [Aggregations](#aggregations)
  - [Count and Group](#count-and-group)
  - [Collect and Unwind](#collect-and-unwind)
- [Filtering](#filtering)
  - [WHERE Clauses](#where-clauses)
  - [EXISTS and NOT EXISTS](#exists-and-not-exists)
- [Write Operations](#write-operations)
  - [Create Nodes](#create-nodes)
  - [Create Relationships](#create-relationships)
  - [MERGE (Upsert)](#merge-upsert)
  - [Update Properties](#update-properties)
  - [Delete](#delete)
- [Recommendations](#recommendations)
  - [Collaborative Filtering](#collaborative-filtering)
  - [Content-Based Filtering](#content-based-filtering)
  - [Hybrid Recommendations](#hybrid-recommendations)
- [Social Graph Patterns](#social-graph-patterns)
  - [Friend Suggestions](#friend-suggestions)
  - [Mutual Connections](#mutual-connections)
  - [Influence Metrics](#influence-metrics)
- [Fraud Detection Patterns](#fraud-detection-patterns)
  - [Circular Money Flows](#circular-money-flows)
  - [Shared Devices](#shared-devices)
  - [Rapid Transaction Chains](#rapid-transaction-chains)
- [Performance Patterns](#performance-patterns)
  - [Use Indexes](#use-indexes)
  - [Limit Early](#limit-early)
  - [Use WITH for Intermediate Results](#use-with-for-intermediate-results)
  - [Avoid Cartesian Products](#avoid-cartesian-products)
- [Temporal Queries](#temporal-queries)
  - [Date Filtering](#date-filtering)
  - [Time-Based Aggregations](#time-based-aggregations)
- [Graph Algorithms (Neo4j GDS)](#graph-algorithms-neo4j-gds)
  - [PageRank](#pagerank)
  - [Community Detection (Louvain)](#community-detection-louvain)
  - [Centrality Metrics](#centrality-metrics)
- [Further Resources](#further-resources)

## Pattern Matching Basics

### Simple Pattern Matching

```cypher
// Find all users
MATCH (u:User)
RETURN u

// Find users with specific property
MATCH (u:User {email: '[email protected]'})
RETURN u

// Find users with WHERE clause
MATCH (u:User)
WHERE u.age >= 25 AND u.city = 'San Francisco'
RETURN u.name, u.age
```

### Relationship Patterns

```cypher
// Outgoing relationship
MATCH (u:User)-[:FRIEND]->(friend)
WHERE u.name = 'Alice'
RETURN friend.name

// Incoming relationship
MATCH (u:User)<-[:FRIEND]-(friend)
WHERE u.name = 'Alice'
RETURN friend.name

// Bidirectional (undirected)
MATCH (u:User)-[:FRIEND]-(friend)
WHERE u.name = 'Alice'
RETURN friend.name

// Multiple relationships
MATCH (u:User)-[:FRIEND]->(f)-[:WORKS_AT]->(c:Company)
WHERE u.name = 'Alice'
RETURN f.name, c.name
```

### Relationship Properties

```cypher
// Filter by relationship property
MATCH (u:User)-[r:FRIEND]->(friend)
WHERE u.name = 'Alice' AND r.since >= date('2020-01-01')
RETURN friend.name, r.since

// Return relationship properties
MATCH (u:User)-[r:FRIEND]->(friend)
WHERE u.name = 'Alice'
RETURN friend.name, r.since, r.strength
```

## Variable-Length Paths

### Fixed-Depth Traversal

```cypher
// Friends of friends (exactly 2 hops)
MATCH (u:User {name: 'Alice'})-[:FRIEND*2]->(fof)
RETURN DISTINCT fof.name

// Up to 3 hops
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(connection)
RETURN DISTINCT connection.name, length(path) AS depth
LIMIT 100
```

### Variable-Depth Traversal

```cypher
// Find all connections (bounded to prevent runaway queries)
MATCH path = (u:User {name: 'Alice'})-[:FRIEND*1..5]->(connection)
WHERE u <> connection
RETURN connection.name, length(path) AS degrees_of_separation
ORDER BY degrees_of_separation
LIMIT 100

// Traverse with relationship type variations
MATCH (u:User {name: 'Alice'})-[:FRIEND|COLLEAGUE*1..3]->(connection)
RETURN DISTINCT connection.name
```

## Path Finding

### Shortest Path

```cypher
// Single shortest path
MATCH path = shortestPath(
  (a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path, length(path) AS distance

// Shortest path with relationship filter
MATCH path = shortestPath(
  (a:User {name: 'Alice'})-[:FRIEND*]-(b:User {name: 'Bob'})
)
RETURN [node IN nodes(path) | node.name] AS route, length(path)
```

### All Shortest Paths

```cypher
// Find all paths with minimum length
MATCH path = allShortestPaths(
  (a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path
```

### Weighted Shortest Path (with GDS)

```cypher
// Using Graph Data Science library
MATCH (source:Location {name: 'New York'}), (target:Location {name: 'Los Angeles'})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
  sourceNode: source,
  targetNode: target,
  relationshipWeightProperty: 'distance'
})
YIELD path, totalCost
RETURN
  [node IN nodes(path) | node.name] AS route,
  totalCost AS distance
```

## Aggregations

### Count and Group

```cypher
// Count relationships
MATCH (u:User)-[r:FRIEND]->()
RETURN u.name, count(r) AS friend_count
ORDER BY friend_count DESC

// Group by property
MATCH (u:User)-[:WORKS_AT]->(c:Company)
RETURN c.name AS company, count(u) AS employee_count
ORDER BY employee_count DESC

// Multiple aggregations
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN
  u.name,
  count(p) AS total_purchases,
  count(DISTINCT p.category) AS categories_purchased,
  avg(p.price) AS avg_price,
  sum(p.price) AS total_spent
```

### Collect and Unwind

```cypher
// Collect related items
MATCH (u:User {name: 'Alice'})-[:FRIEND]->(friend)
RETURN u.name, collect(friend.name) AS friends

// Collect with properties
MATCH (u:User {name: 'Alice'})-[r:FRIEND]->(friend)
RETURN u.name, collect({name: friend.name, since: r.since}) AS friends

// Unwind list to rows
UNWIND ['Alice', 'Bob', 'Charlie'] AS name
MATCH (u:User {name: name})
RETURN u
```

## Filtering

### WHERE Clauses

```cypher
// Multiple conditions
MATCH (u:User)-[:FRIEND]->(friend)
WHERE u.name = 'Alice'
  AND friend.age >= 25
  AND friend.age <= 35
  AND friend.city = 'San Francisco'
RETURN friend.name, friend.age

// String matching
MATCH (u:User)
WHERE u.email ENDS WITH '@example.com'
  AND u.name STARTS WITH 'A'
RETURN u.name, u.email

// Regular expressions
MATCH (u:User)
WHERE u.email =~ '.*@(gmail|yahoo)\\.com'
RETURN u.email

// IN operator
MATCH (u:User)
WHERE u.city IN ['San Francisco', 'New York', 'Boston']
RETURN u.name, u.city
```

### EXISTS and NOT EXISTS

```cypher
// Users who have purchased something
MATCH (u:User)
WHERE exists((u)-[:PURCHASED]->(:Product))
RETURN u.name

// Users who have NOT purchased anything
MATCH (u:User)
WHERE NOT exists((u)-[:PURCHASED]->(:Product))
RETURN u.name

// Complex existence check
MATCH (u:User)
WHERE exists {
  MATCH (u)-[:FRIEND]->(f:User)-[:WORKS_AT]->(:Company {name: 'Google'})
}
RETURN u.name
```

## Write Operations

### Create Nodes

```cypher
// Create single node
CREATE (u:User {id: 'u123', name: 'Alice', email: '[email protected]'})
RETURN u

// Create multiple nodes
CREATE
  (u1:User {id: 'u1', name: 'Alice'}),
  (u2:User {id: 'u2', name: 'Bob'}),
  (c:Company {id: 'c1', name: 'Acme Corp'})

// Create with timestamp
CREATE (u:User {
  id: 'u123',
  name: 'Alice',
  created_at: datetime(),
  updated_at: datetime()
})
```

### Create Relationships

```cypher
// Create relationship between existing nodes
MATCH (u1:User {id: 'u1'}), (u2:User {id: 'u2'})
CREATE (u1)-[:FRIEND {since: date('2020-01-15')}]->(u2)

// Create nodes and relationships together
CREATE (u:User {name: 'Alice'})-[:WORKS_AT {since: date('2020-01-01')}]->(c:Company {name: 'Acme'})

// Bidirectional friendship
MATCH (u1:User {id: 'u1'}), (u2:User {id: 'u2'})
CREATE (u1)-[:FRIEND {since: datetime()}]->(u2),
       (u2)-[:FRIEND {since: datetime()}]->(u1)
```

### MERGE (Upsert)

```cypher
// Create or match node
MERGE (u:User {email: '[email protected]'})
ON CREATE SET u.created = datetime(), u.name = 'Alice'
ON MATCH SET u.updated = datetime()
RETURN u

// Create unique relationships
MATCH (u1:User {id: 'u1'}), (u2:User {id: 'u2'})
MERGE (u1)-[r:FRIEND]-(u2)
ON CREATE SET r.since = datetime()
RETURN r

// Merge with complex logic
MERGE (u:User {email: $email})
ON CREATE SET
  u.id = randomUUID(),
  u.name = $name,
  u.created_at = datetime()
ON MATCH SET
  u.last_login = datetime()
RETURN u
```

### Update Properties

```cypher
// SET to update/add properties
MATCH (u:User {id: 'u123'})
SET u.age = 29, u.updated_at = datetime()
RETURN u

// SET all properties from map
MATCH (u:User {id: 'u123'})
SET u = {id: 'u123', name: 'Alice Smith', age: 29}

// SET += to add properties without removing existing
MATCH (u:User {id: 'u123'})
SET u += {age: 29, city: 'San Francisco'}

// REMOVE property
MATCH (u:User {id: 'u123'})
REMOVE u.temporary_field
```

### Delete

```cypher
// Delete node (must delete relationships first)
MATCH (u:User {id: 'u123'})
DETACH DELETE u

// Delete relationships only
MATCH (u:User {id: 'u123'})-[r:FRIEND]-()
DELETE r

// Conditional delete
MATCH (u:User)
WHERE u.inactive = true AND u.last_login < datetime() - duration('P365D')
DETACH DELETE u
```

## Recommendations

### Collaborative Filtering

```cypher
// Products purchased by similar users
MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar:User)
WITH similar, count(p) AS similarity
ORDER BY similarity DESC
LIMIT 100
MATCH (similar)-[r:PURCHASED]->(rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(rec))
  AND r.rating >= 4
RETURN rec.name, avg(r.rating) AS avg_rating, count(*) AS purchase_count
ORDER BY purchase_count DESC, avg_rating DESC
LIMIT 10
```

### Content-Based Filtering

```cypher
// Products in same categories as user likes
MATCH (u:User {id: $userId})-[r:PURCHASED]->(p:Product)-[:IN_CATEGORY]->(c:Category)
WHERE r.rating >= 4
WITH c, count(p) AS category_score
ORDER BY category_score DESC
MATCH (c)<-[:IN_CATEGORY]-(rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(rec))
RETURN rec.name, sum(category_score) AS relevance
ORDER BY relevance DESC
LIMIT 10
```

### Hybrid Recommendations

```cypher
// Combine collaborative + content-based
MATCH (u:User {id: $userId})

// Collaborative component
OPTIONAL MATCH (u)-[:PURCHASED]->(p1:Product)<-[:PURCHASED]-(similar)
OPTIONAL MATCH (similar)-[:PURCHASED]->(collab_rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(collab_rec))
WITH u, collab_rec, count(*) AS collab_score

// Content-based component
OPTIONAL MATCH (u)-[:PURCHASED]->(p2)-[:IN_CATEGORY]->(c)
OPTIONAL MATCH (c)<-[:IN_CATEGORY]-(content_rec)
WHERE NOT exists((u)-[:PURCHASED]->(content_rec))
WITH collab_rec, collab_score, content_rec, count(*) AS content_score

// Combine scores
WITH coalesce(collab_rec, content_rec) AS recommendation,
     coalesce(collab_score, 0) * 2 + coalesce(content_score, 0) AS total_score
WHERE total_score > 0
RETURN recommendation.name, total_score
ORDER BY total_score DESC
LIMIT 10
```

## Social Graph Patterns

### Friend Suggestions

```cypher
// Friends of friends who aren't already friends
MATCH (u:User {id: $userId})-[:FRIEND]->()-[:FRIEND]->(suggestion)
WHERE NOT exists((u)-[:FRIEND]-(suggestion))
  AND u <> suggestion
WITH suggestion, count(*) AS mutual_friends
WHERE mutual_friends >= 2
RETURN suggestion.name, mutual_friends
ORDER BY mutual_friends DESC
LIMIT 10
```

### Mutual Connections

```cypher
// Find mutual friends between two users
MATCH (u1:User {id: $user1Id})-[:FRIEND]->(mutual)<-[:FRIEND]-(u2:User {id: $user2Id})
RETURN collect(mutual.name) AS mutual_friends, count(mutual) AS count
```

### Influence Metrics

```cypher
// Count followers and following
MATCH (u:User)
OPTIONAL MATCH (u)-[:FOLLOWS]->(following)
OPTIONAL MATCH (u)<-[:FOLLOWS]-(follower)
RETURN
  u.name,
  count(DISTINCT following) AS following_count,
  count(DISTINCT follower) AS follower_count,
  count(DISTINCT follower) * 1.0 / nullif(count(DISTINCT following), 0) AS influence_ratio
ORDER BY follower_count DESC
LIMIT 50
```

## Fraud Detection Patterns

### Circular Money Flows

```cypher
// Detect circular transactions
MATCH path = (a:Account)-[:SENT*3..6]->(a)
WHERE all(r IN relationships(path) WHERE r.amount > 1000)
RETURN
  [n IN nodes(path) | n.id] AS account_chain,
  [r IN relationships(path) | r.amount] AS amounts,
  reduce(total = 0, r IN relationships(path) | total + r.amount) AS total_amount
```

### Shared Devices

```cypher
// Accounts using same device (suspicious)
MATCH (d:Device)<-[:USED_DEVICE]-(t:Transaction)<-[:MADE]-(a:Account)
WITH d, collect(DISTINCT a) AS accounts
WHERE size(accounts) > 5
RETURN
  d.fingerprint,
  [a IN accounts | a.id] AS suspicious_accounts,
  size(accounts) AS account_count
ORDER BY account_count DESC
```

### Rapid Transaction Chains

```cypher
// Fast succession of transfers
MATCH (a1:Account)-[:SENT]->(t1:Transaction)-[:TO]->(a2:Account)
     -[:SENT]->(t2:Transaction)-[:TO]->(a3:Account)
WHERE duration.between(t1.timestamp, t2.timestamp) < duration('PT5M')
  AND t1.amount > 5000
  AND t2.amount > 5000
RETURN a1.id, a2.id, a3.id, t1.amount, t2.amount,
       duration.between(t1.timestamp, t2.timestamp) AS time_between
```

## Performance Patterns

### Use Indexes

```cypher
// Create indexes first
CREATE INDEX user_email FOR (u:User) ON (u.email);
CREATE INDEX product_category FOR (p:Product) ON (p.category);

// Queries benefit from indexes
MATCH (u:User {email: '[email protected]'})
RETURN u
```

### Limit Early

```cypher
// GOOD: Limit early in traversal
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(connection)
RETURN connection.name
LIMIT 100

// BAD: Limit after collecting everything
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(connection)
WITH collect(connection) AS all_connections
RETURN all_connections[0..100]
```

### Use WITH for Intermediate Results

```cypher
// Materialize expensive computations
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS friend_count
WHERE friend_count > 10
MATCH (u)-[:PURCHASED]->(p:Product)
RETURN u.name, friend_count, count(p) AS purchase_count
```

### Avoid Cartesian Products

```cypher
// BAD: Two separate MATCH clauses create cartesian product
MATCH (u:User)
MATCH (p:Product)
RETURN u, p  // Returns every combination!

// GOOD: Connect with relationship
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN u, p
```

## Temporal Queries

### Date Filtering

```cypher
// Recent activity
MATCH (u:User)-[r:POSTED]->(post:Post)
WHERE r.timestamp >= datetime() - duration('P7D')
RETURN u.name, post.title, r.timestamp
ORDER BY r.timestamp DESC

// Date range
MATCH (u:User)-[r:PURCHASED]->(p:Product)
WHERE r.date >= date('2025-01-01') AND r.date < date('2025-02-01')
RETURN u.name, p.name, r.date
```

### Time-Based Aggregations

```cypher
// Group by month
MATCH (u:User)-[r:PURCHASED]->(p:Product)
RETURN
  r.date.year AS year,
  r.date.month AS month,
  count(p) AS purchases,
  sum(p.price) AS revenue
ORDER BY year DESC, month DESC
```

## Graph Algorithms (Neo4j GDS)

### PageRank

```cypher
// Find influential nodes
CALL gds.pageRank.stream('socialGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
```

### Community Detection (Louvain)

```cypher
// Find communities
CALL gds.louvain.stream('socialGraph')
YIELD nodeId, communityId
RETURN communityId, collect(gds.util.asNode(nodeId).name) AS members
ORDER BY size(members) DESC
```

### Centrality Metrics

```cypher
// Betweenness centrality (bridge nodes)
CALL gds.betweenness.stream('socialGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10

// Degree centrality (most connected)
CALL gds.degree.stream('socialGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
```

## Further Resources

- Neo4j Cypher Manual: https://neo4j.com/docs/cypher-manual/current/
- Cypher Refcard: https://neo4j.com/docs/cypher-refcard/current/
- Graph Data Science: https://neo4j.com/docs/graph-data-science/current/

```

### references/neo4j.md

```markdown
# Neo4j Reference Guide


## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
  - [Docker (Recommended for Development)](#docker-recommended-for-development)
  - [Language Drivers](#language-drivers)
- [Connection Setup](#connection-setup)
  - [Python](#python)
  - [TypeScript](#typescript)
- [Graph Data Science (GDS) Library](#graph-data-science-gds-library)
  - [Installation](#installation)
  - [Common Graph Algorithms](#common-graph-algorithms)
- [APOC Procedures](#apoc-procedures)
  - [Installation](#installation)
  - [Common APOC Procedures](#common-apoc-procedures)
- [Indexing and Constraints](#indexing-and-constraints)
  - [Indexes](#indexes)
  - [Constraints](#constraints)
- [Transaction Management](#transaction-management)
  - [Python](#python)
  - [TypeScript](#typescript)
- [Performance Optimization](#performance-optimization)
  - [Query Profiling](#query-profiling)
  - [Performance Best Practices](#performance-best-practices)
- [Neo4j Aura (Managed Cloud)](#neo4j-aura-managed-cloud)
  - [Connection](#connection)
  - [Features](#features)
- [Schema Design Patterns](#schema-design-patterns)
  - [Time-Based Partitioning](#time-based-partitioning)
  - [Intermediate Nodes for Filtering](#intermediate-nodes-for-filtering)
- [Backup and Restore](#backup-and-restore)
  - [Dump Database](#dump-database)
  - [Load Database](#load-database)
  - [Export to Cypher Script](#export-to-cypher-script)
- [Monitoring](#monitoring)
  - [Check Database Stats](#check-database-stats)
  - [Kill Long-Running Queries](#kill-long-running-queries)
- [Common Cypher Functions](#common-cypher-functions)
  - [String Functions](#string-functions)
  - [List Functions](#list-functions)
  - [Aggregation Functions](#aggregation-functions)
- [Further Resources](#further-resources)

## Overview

Neo4j is the most mature and widely-used graph database (since 2007). It uses the Cypher query language and provides 65+ graph algorithms through the Graph Data Science (GDS) library.

## Installation

### Docker (Recommended for Development)
```bash
docker run -d \
  --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:latest
```

Access Neo4j Browser at: http://localhost:7474

### Language Drivers

**Python**:
```bash
pip install neo4j
```

**TypeScript/JavaScript**:
```bash
npm install neo4j-driver
```

**Rust**:
```bash
cargo add neo4rs
```

**Go**:
```bash
go get github.com/neo4j/neo4j-go-driver/v5/neo4j
```

## Connection Setup

### Python
```python
from neo4j import GraphDatabase

class Neo4jConnection:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def verify_connectivity(self):
        with self.driver.session() as session:
            result = session.run("RETURN 1 AS num")
            return result.single()["num"] == 1

# Usage
db = Neo4jConnection("bolt://localhost:7687", "neo4j", "password")
print(f"Connected: {db.verify_connectivity()}")
```

### TypeScript
```typescript
import neo4j, { Driver, Session } from 'neo4j-driver'

class Neo4jConnection {
  private driver: Driver

  constructor(uri: string, username: string, password: string) {
    this.driver = neo4j.driver(uri, neo4j.auth.basic(username, password))
  }

  async close(): Promise<void> {
    await this.driver.close()
  }

  async verifyConnectivity(): Promise<boolean> {
    const session: Session = this.driver.session()
    try {
      const result = await session.run('RETURN 1 AS num')
      return result.records[0].get('num') === 1
    } finally {
      await session.close()
    }
  }
}

// Usage
const db = new Neo4jConnection('bolt://localhost:7687', 'neo4j', 'password')
console.log(`Connected: ${await db.verifyConnectivity()}`)
```

## Graph Data Science (GDS) Library

Neo4j's GDS library provides 65+ production-quality graph algorithms.

### Installation
```cypher
// Check if GDS is installed
CALL gds.list()

// For Neo4j Desktop or self-hosted, install GDS plugin
// Docker: Include GDS-enabled image
```

### Common Graph Algorithms

#### PageRank (Centrality)
Find influential nodes based on their connections.

```cypher
// 1. Create graph projection
CALL gds.graph.project(
  'socialGraph',
  'Person',
  'FRIEND'
)

// 2. Run PageRank
CALL gds.pageRank.stream('socialGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
```

#### Community Detection (Louvain)
Find clusters of densely connected nodes.

```cypher
CALL gds.louvain.stream('socialGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId
ORDER BY communityId
```

#### Shortest Path (Dijkstra)
Find optimal path between nodes considering weighted relationships.

```cypher
MATCH (source:Location {name: 'New York'}), (target:Location {name: 'Los Angeles'})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
  sourceNode: source,
  targetNode: target,
  relationshipWeightProperty: 'distance'
})
YIELD path, totalCost
RETURN [node IN nodes(path) | node.name] AS route, totalCost
```

#### Node Similarity
Find similar nodes based on their neighborhoods.

```cypher
CALL gds.nodeSimilarity.stream('socialGraph')
YIELD node1, node2, similarity
RETURN
  gds.util.asNode(node1).name AS person1,
  gds.util.asNode(node2).name AS person2,
  similarity
ORDER BY similarity DESC
LIMIT 20
```

## APOC Procedures

APOC (Awesome Procedures On Cypher) extends Neo4j with utility functions.

### Installation
```bash
# Docker: Include APOC-enabled image
docker run -d \
  --name neo4j \
  -e NEO4J_PLUGINS='["apoc"]' \
  neo4j:latest
```

### Common APOC Procedures

#### Batch Operations
```cypher
// Batch create nodes from list
CALL apoc.periodic.iterate(
  "UNWIND $users AS user RETURN user",
  "CREATE (u:User {id: user.id, name: user.name})",
  {batchSize: 1000, params: {users: $userList}}
)
```

#### JSON Import/Export
```cypher
// Import JSON
CALL apoc.load.json('file:///path/to/data.json')
YIELD value
CREATE (u:User {id: value.id, name: value.name})

// Export to JSON
CALL apoc.export.json.query(
  "MATCH (u:User) RETURN u",
  "users.json",
  {}
)
```

#### Graph Algorithms (APOC)
```cypher
// Betweenness centrality
MATCH (u:User)
WITH collect(u) AS users
CALL apoc.algo.betweenness(['FRIEND'], users, 'BOTH')
YIELD node, score
RETURN node.name, score
ORDER BY score DESC
LIMIT 10
```

## Indexing and Constraints

### Indexes
```cypher
// Single-property index
CREATE INDEX user_email FOR (u:User) ON (u.email)

// Composite index (Neo4j 5.x+)
CREATE INDEX user_name_location FOR (u:User) ON (u.name, u.location)

// Full-text search index
CREATE FULLTEXT INDEX product_search FOR (p:Product) ON EACH [p.name, p.description]

// List all indexes
SHOW INDEXES
```

### Constraints
```cypher
// Unique constraint
CREATE CONSTRAINT user_email_unique FOR (u:User) REQUIRE u.email IS UNIQUE

// Existence constraint (Enterprise only)
CREATE CONSTRAINT user_name_exists FOR (u:User) REQUIRE u.name IS NOT NULL

// Node key (composite uniqueness)
CREATE CONSTRAINT user_key FOR (u:User) REQUIRE (u.id, u.email) IS NODE KEY

// List all constraints
SHOW CONSTRAINTS
```

## Transaction Management

### Python
```python
def create_friendship_transaction(tx, user1_id, user2_id):
    query = """
    MATCH (u1:User {id: $user1Id}), (u2:User {id: $user2Id})
    MERGE (u1)-[:FRIEND {since: datetime()}]->(u2)
    MERGE (u2)-[:FRIEND {since: datetime()}]->(u1)
    RETURN u1.name, u2.name
    """
    result = tx.run(query, user1Id=user1_id, user2Id=user2_id)
    return result.single()

# Execute in transaction
with driver.session() as session:
    result = session.execute_write(create_friendship_transaction, "u123", "u456")
    print(f"Created friendship between {result[0]} and {result[1]}")
```

### TypeScript
```typescript
async createFriendship(user1Id: string, user2Id: string): Promise<void> {
  const session = this.driver.session()
  try {
    await session.executeWrite(async tx => {
      const query = `
        MATCH (u1:User {id: $user1Id}), (u2:User {id: $user2Id})
        MERGE (u1)-[:FRIEND {since: datetime()}]->(u2)
        MERGE (u2)-[:FRIEND {since: datetime()}]->(u1)
      `
      await tx.run(query, { user1Id, user2Id })
    })
  } finally {
    await session.close()
  }
}
```

## Performance Optimization

### Query Profiling
```cypher
// EXPLAIN: Shows query plan without executing
EXPLAIN
MATCH (u:User {email: '[email protected]'})-[:FRIEND*1..3]->(friend)
RETURN friend.name

// PROFILE: Executes and shows detailed statistics
PROFILE
MATCH (u:User {email: '[email protected]'})-[:FRIEND*1..3]->(friend)
RETURN friend.name
LIMIT 100
```

### Performance Best Practices

**1. Bounded Variable-Length Paths**
```cypher
// SLOW
MATCH (u:User)-[:FRIEND*]->(distant)

// FAST
MATCH (u:User)-[:FRIEND*1..4]->(distant)
LIMIT 100
```

**2. Use Indexes**
```cypher
// Create index before querying
CREATE INDEX user_email FOR (u:User) ON (u.email)

// Query benefits from index
MATCH (u:User {email: '[email protected]'})
RETURN u
```

**3. Eager Loading with WITH**
```cypher
// Materialize intermediate results
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS friendCount
WHERE friendCount > 10
RETURN u.name, friendCount
```

## Neo4j Aura (Managed Cloud)

Neo4j Aura is the fully managed cloud service.

### Connection
```python
# Aura uses neo4j+s:// or neo4j+ssc:// protocols
driver = GraphDatabase.driver(
    "neo4j+s://xxxxx.databases.neo4j.io",
    auth=("neo4j", "password")
)
```

### Features
- Auto-scaling
- Automated backups
- Security (encryption at rest and in transit)
- Monitoring dashboards
- Free tier available (50K nodes, 175K relationships)

## Schema Design Patterns

### Time-Based Partitioning
```cypher
// Avoid: User directly connected to millions of posts
(:User)-[:POSTED]->(:Post) // 1M+ relationships

// Better: Partition by time
(:User)-[:POSTED_IN]->(:Year {year: 2025})
  -[:HAS_MONTH]->(:Month {month: 12})
  -[:HAS_DAY]->(:Day {day: 15})
  -[:CONTAINS]->(:Post)
```

### Intermediate Nodes for Filtering
```cypher
// Complex filtering on categories
(:Product)-[:IN_CATEGORY]->(:Category)
(:Product)-[:HAS_TAG]->(:Tag)

// Query becomes simpler
MATCH (c:Category {name: 'Electronics'})<-[:IN_CATEGORY]-(p:Product)-[:HAS_TAG]->(t:Tag {name: 'Sale'})
RETURN p
```

## Backup and Restore

### Dump Database
```bash
neo4j-admin database dump neo4j --to-path=/backups
```

### Load Database
```bash
neo4j-admin database load neo4j --from-path=/backups
```

### Export to Cypher Script
```cypher
CALL apoc.export.cypher.all("backup.cypher", {
  format: "cypher-shell",
  useOptimizations: {type: "UNWIND_BATCH", unwindBatchSize: 20}
})
```

## Monitoring

### Check Database Stats
```cypher
// Database info
CALL dbms.queryJmx('org.neo4j:instance=kernel#0,name=Store sizes') YIELD attributes
RETURN attributes

// Transaction stats
CALL dbms.listTransactions()

// Active queries
CALL dbms.listQueries()
```

### Kill Long-Running Queries
```cypher
// List queries
CALL dbms.listQueries() YIELD queryId, query, elapsedTimeMillis
WHERE elapsedTimeMillis > 30000
RETURN queryId, query

// Kill query
CALL dbms.killQuery('query-123')
```

## Common Cypher Functions

### String Functions
```cypher
RETURN toLower('HELLO') // 'hello'
RETURN toUpper('hello') // 'HELLO'
RETURN substring('Hello World', 0, 5) // 'Hello'
RETURN replace('Hello World', 'World', 'Neo4j') // 'Hello Neo4j'
```

### List Functions
```cypher
RETURN size([1,2,3,4,5]) // 5
RETURN head([1,2,3,4,5]) // 1
RETURN tail([1,2,3,4,5]) // [2,3,4,5]
RETURN range(0, 10, 2) // [0,2,4,6,8,10]
```

### Aggregation Functions
```cypher
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN
  count(p) AS totalPurchases,
  count(DISTINCT p) AS uniqueProducts,
  avg(p.price) AS avgPrice,
  sum(p.price) AS totalSpent,
  collect(p.name) AS productNames
```

## Further Resources

- Official Neo4j Documentation: https://neo4j.com/docs/
- Graph Data Science Manual: https://neo4j.com/docs/graph-data-science/current/
- APOC Documentation: https://neo4j.com/labs/apoc/
- Cypher Manual: https://neo4j.com/docs/cypher-manual/current/
- Neo4j GraphAcademy (Free Training): https://graphacademy.neo4j.com/

```

### references/arangodb.md

```markdown
# ArangoDB Reference Guide


## Table of Contents

- [Overview](#overview)
- [Key Advantages](#key-advantages)
- [Installation](#installation)
  - [Docker](#docker)
  - [Language Drivers](#language-drivers)
- [Connection Setup](#connection-setup)
  - [Python](#python)
  - [TypeScript](#typescript)
- [Multi-Model Architecture](#multi-model-architecture)
  - [Collections](#collections)
  - [Graph Definitions](#graph-definitions)
- [AQL Query Language](#aql-query-language)
  - [Document Queries](#document-queries)
  - [Graph Traversals](#graph-traversals)
- [Multi-Model Queries](#multi-model-queries)
- [Recommendation Patterns](#recommendation-patterns)
  - [Collaborative Filtering](#collaborative-filtering)
  - [Content-Based Filtering](#content-based-filtering)
- [SmartGraphs (Enterprise)](#smartgraphs-enterprise)
- [Indexing](#indexing)
  - [Hash Index](#hash-index)
  - [Skiplist Index](#skiplist-index)
  - [Full-Text Index](#full-text-index)
  - [Geo Index](#geo-index)
- [Performance Optimization](#performance-optimization)
  - [Query Profiling](#query-profiling)
  - [Traversal Optimization](#traversal-optimization)
- [TypeScript Integration](#typescript-integration)
- [Go Integration](#go-integration)
- [Data Migration](#data-migration)
  - [Import from JSON](#import-from-json)
  - [Export to JSON](#export-to-json)
- [Graph Algorithms](#graph-algorithms)
  - [Shortest Path](#shortest-path)
  - [All Shortest Paths](#all-shortest-paths)
  - [K Shortest Paths](#k-shortest-paths)
- [Backup and Restore](#backup-and-restore)
  - [Backup (arangodump)](#backup-arangodump)
  - [Restore (arangorestore)](#restore-arangorestore)
- [ArangoDB Oasis (Managed Cloud)](#arangodb-oasis-managed-cloud)
- [When to Use ArangoDB](#when-to-use-arangodb)
- [Further Resources](#further-resources)

## Overview

ArangoDB is a multi-model database supporting documents, graphs, and key-value storage in a single unified query language (AQL). This makes it ideal for applications that need both flexible document storage AND graph relationships.

## Key Advantages

1. **Multi-model**: Store documents and graphs together
2. **Single query language**: AQL combines document and graph operations
3. **Schema flexibility**: Documents can have varying structures
4. **Distributed**: Built-in sharding and clustering (SmartGraphs)
5. **Open source**: Apache 2.0 license

## Installation

### Docker
```bash
docker run -d \
  --name arangodb \
  -p 8529:8529 \
  -e ARANGO_ROOT_PASSWORD=password \
  arangodb/arangodb:latest
```

Access Web UI at: http://localhost:8529

### Language Drivers

**Python**:
```bash
pip install python-arango
```

**TypeScript/JavaScript**:
```bash
npm install arangojs
```

**Go**:
```bash
go get github.com/arangodb/go-driver
```

## Connection Setup

### Python
```python
from arango import ArangoClient

# Initialize client
client = ArangoClient(hosts='http://localhost:8529')

# Connect to system database
sys_db = client.db('_system', username='root', password='password')

# Create or access custom database
if not sys_db.has_database('mydb'):
    sys_db.create_database('mydb')

db = client.db('mydb', username='root', password='password')
```

### TypeScript
```typescript
import { Database } from 'arangojs'

const db = new Database({
  url: 'http://localhost:8529',
  databaseName: 'mydb',
  auth: { username: 'root', password: 'password' }
})

// Verify connection
const version = await db.version()
console.log(`ArangoDB ${version.version}`)
```

## Multi-Model Architecture

### Collections

ArangoDB has three collection types:

1. **Document Collections**: Standard NoSQL documents
2. **Edge Collections**: Store relationships (graph edges)
3. **Vertex Collections**: Nodes in graphs (can be document collections)

```python
# Create document collection
users = db.create_collection('users')

# Create edge collection (for relationships)
friends = db.create_collection('friends', edge=True)

# Insert document
users.insert({'_key': 'alice', 'name': 'Alice', 'age': 28})
users.insert({'_key': 'bob', 'name': 'Bob', 'age': 30})

# Insert edge (relationship)
friends.insert({
    '_from': 'users/alice',
    '_to': 'users/bob',
    'since': '2020-01-15'
})
```

### Graph Definitions

Named graphs provide schema-like definitions for relationships:

```python
# Define graph
graph = db.create_graph('social')

# Add edge definition
graph.create_edge_definition(
    edge_collection='friends',
    from_vertex_collections=['users'],
    to_vertex_collections=['users']
)

# Now you can traverse using graph name
```

## AQL Query Language

### Document Queries

```aql
// Find all users
FOR user IN users
  RETURN user

// Filter by property
FOR user IN users
  FILTER user.age >= 25
  RETURN {name: user.name, age: user.age}

// Join documents
FOR user IN users
  FOR post IN posts
    FILTER post.author_id == user._key
    RETURN {user: user.name, post: post.title}
```

### Graph Traversals

#### Basic Traversal
```aql
// Find friends (1 hop)
FOR vertex IN 1..1 OUTBOUND 'users/alice' GRAPH 'social'
  RETURN vertex.name

// Friends of friends (2 hops)
FOR vertex IN 2..2 OUTBOUND 'users/alice' GRAPH 'social'
  RETURN DISTINCT vertex.name
```

#### Variable-Depth Traversal
```aql
// Find connections up to 3 levels deep
FOR vertex, edge, path IN 1..3 OUTBOUND 'users/alice' GRAPH 'social'
  RETURN {
    name: vertex.name,
    depth: LENGTH(path.edges),
    relationship: edge.type
  }
```

#### Shortest Path
```aql
// Find shortest path between two users
FOR path IN OUTBOUND SHORTEST_PATH
  'users/alice' TO 'users/charlie' GRAPH 'social'
  RETURN path
```

#### Pattern Matching
```aql
// Find users who like similar products
FOR user IN users
  FILTER user._key == 'alice'
  FOR liked IN OUTBOUND user GRAPH 'shopping'
    FOR similar_user IN INBOUND liked GRAPH 'shopping'
      FILTER similar_user._key != user._key
      COLLECT similar = similar_user WITH COUNT INTO likes
      SORT likes DESC
      LIMIT 10
      RETURN {user: similar.name, common_likes: likes}
```

## Multi-Model Queries

Combine document and graph operations:

```aql
// Find friends in a specific city
FOR user IN users
  FILTER user._key == 'alice'
  FOR friend IN 1..2 OUTBOUND user GRAPH 'social'
    FILTER friend.city == 'San Francisco'
    FILTER friend.age >= 25 AND friend.age <= 35
    RETURN {
      name: friend.name,
      age: friend.age,
      city: friend.city
    }
```

## Recommendation Patterns

### Collaborative Filtering
```aql
// Products purchased by similar users
FOR user IN users
  FILTER user._key == @userId
  FOR product IN OUTBOUND user purchases
    FOR similar_user IN INBOUND product purchases
      FILTER similar_user._key != user._key
      FOR recommendation IN OUTBOUND similar_user purchases
        FILTER recommendation._key NOT IN (
          FOR p IN OUTBOUND user purchases RETURN p._key
        )
        COLLECT rec = recommendation WITH COUNT INTO score
        SORT score DESC
        LIMIT 10
        RETURN {product: rec, score: score}
```

### Content-Based Filtering
```aql
// Products in same categories as user's purchases
FOR user IN users
  FILTER user._key == @userId
  FOR product IN OUTBOUND user purchases
    FOR category IN OUTBOUND product in_category
      FOR recommendation IN INBOUND category in_category
        FILTER recommendation._key NOT IN (
          FOR p IN OUTBOUND user purchases RETURN p._key
        )
        COLLECT rec = recommendation WITH COUNT INTO relevance
        SORT relevance DESC
        LIMIT 10
        RETURN {product: rec.name, relevance: relevance}
```

## SmartGraphs (Enterprise)

SmartGraphs enable horizontal sharding for massive graphs.

```python
# Create smart graph (Enterprise only)
graph = db.create_graph(
    'social',
    edge_definitions=[{
        'edge_collection': 'friends',
        'from_vertex_collections': ['users'],
        'to_vertex_collections': ['users']
    }],
    smart=True,
    smart_graph_attribute='region'
)

# Documents with same region value are co-located
users.insert({'_key': 'alice', 'region': 'US-West', 'name': 'Alice'})
```

## Indexing

### Hash Index
```aql
// Create hash index for exact matches
db._collection('users').ensureIndex({
  type: 'hash',
  fields: ['email'],
  unique: true
})
```

### Skiplist Index
```aql
// Create skiplist index for range queries
db._collection('users').ensureIndex({
  type: 'skiplist',
  fields: ['age', 'city']
})
```

### Full-Text Index
```aql
// Create full-text search index
db._collection('products').ensureIndex({
  type: 'fulltext',
  fields: ['name', 'description'],
  minLength: 3
})

// Query full-text index
FOR doc IN FULLTEXT('products', 'name,description', 'laptop computer')
  RETURN doc
```

### Geo Index
```aql
// Create geo index for location queries
db._collection('stores').ensureIndex({
  type: 'geo',
  fields: ['location'],
  geoJson: true
})

// Find nearby stores
FOR store IN stores
  FILTER GEO_DISTANCE(store.location, [-122.4194, 37.7749]) <= 5000
  RETURN {name: store.name, distance: GEO_DISTANCE(store.location, [-122.4194, 37.7749])}
  SORT distance
```

## Performance Optimization

### Query Profiling
```aql
// Get query execution plan
EXPLAIN
FOR user IN users
  FILTER user.age > 25
  RETURN user

// Execute with profiling
FOR user IN users
  FILTER user.age > 25
  RETURN user
OPTIONS {profile: 2}
```

### Traversal Optimization

**1. Bounded Depth**
```aql
// SLOW: Unbounded
FOR v IN OUTBOUND 'users/alice' GRAPH 'social'
  RETURN v

// FAST: Bounded depth
FOR v IN 1..3 OUTBOUND 'users/alice' GRAPH 'social'
  RETURN v
```

**2. Prune Early**
```aql
FOR v, e, p IN 1..5 OUTBOUND 'users/alice' GRAPH 'social'
  PRUNE v.blocked == true  // Stop traversing if node is blocked
  FILTER v.active == true
  RETURN v
```

**3. Use Edge Collections Directly**
```aql
// When you don't need full graph traversal
FOR edge IN friends
  FILTER edge._from == 'users/alice'
  FOR user IN users
    FILTER user._id == edge._to
    RETURN user
```

## TypeScript Integration

```typescript
import { Database, aql } from 'arangojs'

class ArangoService {
  private db: Database

  constructor() {
    this.db = new Database({
      url: 'http://localhost:8529',
      databaseName: 'mydb',
      auth: { username: 'root', password: 'password' }
    })
  }

  async findFriendsOfFriends(userId: string, maxDepth: number = 2) {
    const query = aql`
      FOR vertex IN ${maxDepth}..${maxDepth} OUTBOUND ${`users/${userId}`} GRAPH 'social'
        RETURN DISTINCT {
          id: vertex._key,
          name: vertex.name,
          email: vertex.email
        }
        LIMIT 100
    `

    const cursor = await this.db.query(query)
    return await cursor.all()
  }

  async createFriendship(user1Id: string, user2Id: string) {
    const friendsCollection = this.db.collection('friends')
    await friendsCollection.save({
      _from: `users/${user1Id}`,
      _to: `users/${user2Id}`,
      since: new Date().toISOString()
    })
  }

  async recommendProducts(userId: string, limit: number = 10) {
    const query = aql`
      FOR user IN users
        FILTER user._key == ${userId}
        FOR product IN OUTBOUND user purchases
          FOR similar_user IN INBOUND product purchases
            FILTER similar_user._key != user._key
            FOR rec IN OUTBOUND similar_user purchases
              FILTER rec._key NOT IN (FOR p IN OUTBOUND user purchases RETURN p._key)
              COLLECT recommendation = rec WITH COUNT INTO score
              SORT score DESC
              LIMIT ${limit}
              RETURN {
                product: recommendation.name,
                score: score
              }
    `

    const cursor = await this.db.query(query)
    return await cursor.all()
  }
}
```

## Go Integration

```go
package main

import (
    "context"
    "fmt"
    driver "github.com/arangodb/go-driver"
    "github.com/arangodb/go-driver/http"
)

type User struct {
    Key   string `json:"_key"`
    Name  string `json:"name"`
    Email string `json:"email"`
}

func findFriendsOfFriends(db driver.Database, userId string, maxDepth int) ([]User, error) {
    ctx := context.Background()

    query := fmt.Sprintf(`
        FOR vertex IN %d..%d OUTBOUND 'users/%s' GRAPH 'social'
            RETURN DISTINCT vertex
            LIMIT 100
    `, maxDepth, maxDepth, userId)

    cursor, err := db.Query(ctx, query, nil)
    if err != nil {
        return nil, err
    }
    defer cursor.Close()

    var users []User
    for cursor.HasMore() {
        var user User
        _, err := cursor.ReadDocument(ctx, &user)
        if err != nil {
            return nil, err
        }
        users = append(users, user)
    }

    return users, nil
}
```

## Data Migration

### Import from JSON
```python
import json

with open('users.json', 'r') as f:
    users_data = json.load(f)

# Batch insert
users.import_bulk(users_data)
```

### Export to JSON
```python
# Export collection
cursor = db.aql.execute('FOR doc IN users RETURN doc')
users_list = [doc for doc in cursor]

with open('users_export.json', 'w') as f:
    json.dump(users_list, f, indent=2)
```

## Graph Algorithms

ArangoDB provides basic graph algorithms through AQL:

### Shortest Path
```aql
FOR path IN OUTBOUND SHORTEST_PATH
  'users/alice' TO 'users/charlie' GRAPH 'social'
  RETURN {
    vertices: path.vertices[*].name,
    edges: path.edges[*].type,
    distance: LENGTH(path.edges)
  }
```

### All Shortest Paths
```aql
FOR path IN OUTBOUND ALL_SHORTEST_PATHS
  'users/alice' TO 'users/charlie' GRAPH 'social'
  RETURN path
```

### K Shortest Paths
```aql
FOR path IN OUTBOUND K_SHORTEST_PATHS
  'users/alice' TO 'users/charlie' GRAPH 'social'
  LIMIT 5
  RETURN path
```

## Backup and Restore

### Backup (arangodump)
```bash
arangodump \
  --server.endpoint tcp://localhost:8529 \
  --server.username root \
  --server.password password \
  --output-directory /backups/mydb \
  --overwrite true
```

### Restore (arangorestore)
```bash
arangorestore \
  --server.endpoint tcp://localhost:8529 \
  --server.username root \
  --server.password password \
  --input-directory /backups/mydb
```

## ArangoDB Oasis (Managed Cloud)

ArangoDB Oasis is the fully managed cloud service.

**Features**:
- Automated backups and updates
- Multi-region deployments
- SSL/TLS encryption
- Monitoring dashboards
- Free tier available

**Connection**:
```python
from arango import ArangoClient

client = ArangoClient(hosts='https://xxxxx.arangodb.cloud:8529')
db = client.db('mydb', username='root', password='password')
```

## When to Use ArangoDB

**Choose ArangoDB when**:
- Need both documents AND graphs (multi-model)
- Schema flexibility is important
- Want single query language for all data
- Need distributed graph processing (SmartGraphs)

**Choose Neo4j when**:
- Graph-first workload (relationships > documents)
- Need advanced graph algorithms (GDS library)
- Prefer Cypher's pattern matching syntax

## Further Resources

- ArangoDB Documentation: https://www.arangodb.com/docs/
- AQL Tutorial: https://www.arangodb.com/docs/stable/aql/
- ArangoDB University (Free): https://www.arangodb.com/arangodb-training-center/

```

### references/graph-modeling.md

```markdown
# Graph Data Modeling Best Practices


## Table of Contents

- [Core Principles](#core-principles)
  - [1. Relationships Are First-Class Citizens](#1-relationships-are-first-class-citizens)
  - [2. Model for Query Patterns](#2-model-for-query-patterns)
  - [3. Denormalize for Performance](#3-denormalize-for-performance)
- [Node Design](#node-design)
  - [Labels](#labels)
  - [Properties](#properties)
- [Relationship Design](#relationship-design)
  - [Relationship Types](#relationship-types)
  - [Relationship Direction](#relationship-direction)
  - [Relationship Properties](#relationship-properties)
- [Common Anti-Patterns](#common-anti-patterns)
  - [Anti-Pattern 1: Arrays Instead of Relationships](#anti-pattern-1-arrays-instead-of-relationships)
  - [Anti-Pattern 2: Supernodes](#anti-pattern-2-supernodes)
  - [Anti-Pattern 3: Dense Nodes Without Indexes](#anti-pattern-3-dense-nodes-without-indexes)
  - [Anti-Pattern 4: Unbounded Traversals](#anti-pattern-4-unbounded-traversals)
- [Schema Design Patterns](#schema-design-patterns)
  - [Pattern 1: Star Schema (Hub and Spokes)](#pattern-1-star-schema-hub-and-spokes)
  - [Pattern 2: Hierarchical Schema (Trees)](#pattern-2-hierarchical-schema-trees)
  - [Pattern 3: Linked List (Sequences)](#pattern-3-linked-list-sequences)
  - [Pattern 4: Many-to-Many with Junction Nodes](#pattern-4-many-to-many-with-junction-nodes)
  - [Pattern 5: Temporal Versioning](#pattern-5-temporal-versioning)
- [Specialized Schemas](#specialized-schemas)
  - [Social Network Schema](#social-network-schema)
  - [Knowledge Graph Schema](#knowledge-graph-schema)
  - [Recommendation Engine Schema](#recommendation-engine-schema)
  - [Access Control Schema (ReBAC)](#access-control-schema-rebac)
- [Modeling for Performance](#modeling-for-performance)
  - [Strategy 1: Materialize Computed Values](#strategy-1-materialize-computed-values)
  - [Strategy 2: Shortcut Relationships](#strategy-2-shortcut-relationships)
  - [Strategy 3: Denormalize Frequently Accessed Data](#strategy-3-denormalize-frequently-accessed-data)
- [Indexing Strategy](#indexing-strategy)
  - [When to Index](#when-to-index)
  - [Composite Indexes](#composite-indexes)
  - [Full-Text Indexes](#full-text-indexes)
- [Migration Patterns](#migration-patterns)
  - [From Relational to Graph](#from-relational-to-graph)
  - [Migration Steps](#migration-steps)
- [Schema Validation](#schema-validation)
  - [Constraints](#constraints)
  - [Validation Queries](#validation-queries)
- [Further Reading](#further-reading)

## Core Principles

### 1. Relationships Are First-Class Citizens

Unlike relational databases where relationships are implied through foreign keys, graph databases make relationships explicit and queryable.

**Relational Model** (relationships hidden in foreign keys):
```sql
CREATE TABLE users (id, name, email);
CREATE TABLE friendships (user_id, friend_id, since);
```

**Graph Model** (relationships are explicit):
```cypher
(:User {id, name, email})-[:FRIEND {since}]->(:User)
```

### 2. Model for Query Patterns

Design your graph schema based on how you'll query it, not just the entity relationships.

**Example**: If you frequently ask "Who works with Alice?", create direct `COLLEAGUE` relationships instead of traversing through `WORKS_AT` to `COMPANY` and back.

### 3. Denormalize for Performance

Graph databases benefit from denormalization. Duplicate data to avoid expensive traversals.

```cypher
// Instead of traversing to get user's company name every time
(u:User)-[:WORKS_AT]->(c:Company {name: 'Acme'})

// Consider denormalizing if frequently accessed
(u:User {company_name: 'Acme'})-[:WORKS_AT]->(c:Company {name: 'Acme'})
```

## Node Design

### Labels

Use labels to categorize nodes. A node can have multiple labels.

```cypher
// Single label
(:User {name: 'Alice'})

// Multiple labels for refinement
(:User:Premium:Verified {name: 'Alice'})
(:Content:Video:Tutorial {title: 'Intro to Graphs'})
```

**Best Practice**: Use labels for types you'll filter by frequently.

```cypher
// Efficient with label
MATCH (u:PremiumUser)
RETURN count(u)

// Less efficient without label
MATCH (u:User)
WHERE u.premium = true
RETURN count(u)
```

### Properties

Store attributes that don't need to be queryable separately as properties.

```cypher
(:User {
  id: 'u123',
  name: 'Alice',
  email: '[email protected]',
  age: 28,
  created_at: datetime(),
  preferences: {theme: 'dark', language: 'en'}  // Nested object
})
```

**Property Types** (Neo4j):
- String
- Integer, Float
- Boolean
- Date, DateTime, Duration
- Point (geospatial)
- Lists of above types

**Best Practice**: Index properties you'll filter by:
```cypher
CREATE INDEX user_email FOR (u:User) ON (u.email);
```

## Relationship Design

### Relationship Types

Use descriptive, verb-based relationship types.

**Good**:
```cypher
(:User)-[:FRIEND]->(:User)
(:User)-[:PURCHASED]->(:Product)
(:User)-[:MANAGES]->(:Team)
```

**Avoid**:
```cypher
(:User)-[:RELATES_TO]->(:User)  // Too generic
(:User)-[:LINK]->(:Product)     // Not descriptive
```

### Relationship Direction

Choose direction based on query patterns.

```cypher
// If you ask "Who does Alice manage?"
(:User {name: 'Alice'})-[:MANAGES]->(:User)

// If you ask "Who manages Alice?"
(:User)-[:MANAGES]->(:User {name: 'Alice'})

// For symmetric relationships, choose one direction and query both ways
(:User)-[:FRIEND]->(:User)
// Query: MATCH (u)-[:FRIEND]-(friend) // No arrow = both directions
```

### Relationship Properties

Store metadata about relationships as properties.

```cypher
(:User)-[:FRIEND {
  since: date('2020-01-15'),
  strength: 0.85,
  last_interaction: datetime(),
  interaction_count: 142
}]->(:User)

(:User)-[:PURCHASED {
  date: datetime(),
  quantity: 2,
  price: 49.99,
  rating: 5
}]->(:Product)
```

## Common Anti-Patterns

### Anti-Pattern 1: Arrays Instead of Relationships

**Bad**:
```cypher
(:User {name: 'Alice', friend_ids: ['u2', 'u3', 'u4']})
```

**Good**:
```cypher
(:User {name: 'Alice'})-[:FRIEND]->(:User {id: 'u2'})
(:User {name: 'Alice'})-[:FRIEND]->(:User {id: 'u3'})
```

**Why**: Relationships are indexed and traversable. Arrays require loading the entire node.

### Anti-Pattern 2: Supernodes

**Problem**: Nodes with thousands of relationships slow down traversals.

```cypher
// BAD: User connected to 1M posts
(:User)-[:POSTED]->(:Post)  // 1,000,000 relationships
```

**Solution 1**: Time-based partitioning
```cypher
(:User)-[:POSTED_IN]->(:Year {year: 2025})
  -[:HAS_MONTH]->(:Month {month: 12})
  -[:HAS_POST]->(:Post)
```

**Solution 2**: Category partitioning
```cypher
(:User)-[:HAS_POSTS_IN]->(:Category {name: 'Tech'})
  -[:CONTAINS]->(:Post)
```

### Anti-Pattern 3: Dense Nodes Without Indexes

**Problem**: Querying node properties without indexes on large graphs.

```cypher
// SLOW: Full scan
MATCH (u:User)
WHERE u.email = '[email protected]'
RETURN u
```

**Solution**: Create index
```cypher
CREATE INDEX user_email FOR (u:User) ON (u.email);

// Now fast
MATCH (u:User {email: '[email protected]'})
RETURN u
```

### Anti-Pattern 4: Unbounded Traversals

**Problem**: Variable-length paths without depth limits.

```cypher
// BAD: Can traverse entire graph
MATCH (a:User {name: 'Alice'})-[:FRIEND*]->(distant)
RETURN distant
```

**Solution**: Bound the depth
```cypher
// GOOD: Limited to 4 hops
MATCH (a:User {name: 'Alice'})-[:FRIEND*1..4]->(distant)
RETURN distant
LIMIT 100
```

## Schema Design Patterns

### Pattern 1: Star Schema (Hub and Spokes)

Central entity with many relationships to different types.

```cypher
// E-commerce user
(:User)-[:PURCHASED]->(:Product)
(:User)-[:VIEWED]->(:Product)
(:User)-[:ADDED_TO_CART]->(:Product)
(:User)-[:RATED]->(:Product)
(:User)-[:REVIEWED]->(:Product)
(:User)-[:HAS_ADDRESS]->(:Address)
(:User)-[:HAS_PAYMENT_METHOD]->(:PaymentMethod)
```

**Use when**: Central entity has many different relationships.

### Pattern 2: Hierarchical Schema (Trees)

Parent-child relationships forming a tree.

```cypher
// Organizational hierarchy
(:CEO)-[:MANAGES]->(:VP)
  -[:MANAGES]->(:Director)
  -[:MANAGES]->(:Manager)
  -[:MANAGES]->(:Employee)

// Category taxonomy
(:RootCategory {name: 'Electronics'})
  -[:HAS_SUBCATEGORY]->(:Category {name: 'Computers'})
  -[:HAS_SUBCATEGORY]->(:Category {name: 'Laptops'})
```

**Query pattern**: Variable-length paths
```cypher
MATCH (root:RootCategory {name: 'Electronics'})-[:HAS_SUBCATEGORY*]->(sub)
RETURN sub.name
```

### Pattern 3: Linked List (Sequences)

Events or items in temporal order.

```cypher
// Event timeline
(:Event {timestamp: '2025-01-01'})-[:NEXT]->
(:Event {timestamp: '2025-01-02'})-[:NEXT]->
(:Event {timestamp: '2025-01-03'})

// Version history
(:DocumentV1)-[:REVISED_TO]->(:DocumentV2)-[:REVISED_TO]->(:DocumentV3)
```

**Query pattern**: Follow chain
```cypher
MATCH (start:Event {id: 'event1'})-[:NEXT*]->(subsequent)
RETURN subsequent
ORDER BY subsequent.timestamp
```

### Pattern 4: Many-to-Many with Junction Nodes

Complex many-to-many with additional attributes.

```cypher
// Student enrollment with grades
(:Student)-[:ENROLLED_IN {semester: 'Fall 2025'}]->
(:Enrollment {grade: 'A', credits: 3})<-[:OFFERS]-
(:Course {name: 'Graph Databases'})
```

**Alternative**: Direct relationship with properties
```cypher
(:Student)-[:ENROLLED_IN {
  semester: 'Fall 2025',
  grade: 'A',
  credits: 3
}]->(:Course)
```

### Pattern 5: Temporal Versioning

Track entity changes over time.

```cypher
// Current version
(:User {id: 'u123', name: 'Alice', version: 3})
  -[:PREVIOUS_VERSION]->(:UserVersion {name: 'Alice Smith', version: 2})
  -[:PREVIOUS_VERSION]->(:UserVersion {name: 'Alice Jones', version: 1})

// Query history
MATCH (u:User {id: 'u123'})-[:PREVIOUS_VERSION*]->(history)
RETURN history.name, history.version
ORDER BY history.version
```

## Specialized Schemas

### Social Network Schema

```cypher
// Nodes
(:Person {id, name, email, joined_date})
(:Post {id, content, created_at})
(:Comment {id, text, created_at})
(:Group {id, name, description})

// Relationships
(:Person)-[:FRIEND {since}]->(:Person)
(:Person)-[:FOLLOWS]->(:Person)
(:Person)-[:MEMBER_OF {joined}]->(:Group)
(:Person)-[:POSTED {timestamp}]->(:Post)
(:Person)-[:COMMENTED {timestamp}]->(:Comment)
(:Comment)-[:REPLY_TO]->(:Post)
(:Comment)-[:REPLY_TO]->(:Comment)
(:Person)-[:LIKES {timestamp}]->(:Post)
(:Person)-[:LIKES {timestamp}]->(:Comment)
```

### Knowledge Graph Schema

```cypher
// Nodes
(:Concept {id, name, description})
(:Document {id, title, url})
(:Entity {id, name, type})  // person, org, location
(:Topic {id, name})

// Relationships
(:Concept)-[:RELATED_TO {strength}]->(:Concept)
(:Concept)-[:IS_A]->(:Concept)  // Hierarchical (ontology)
(:Concept)-[:HAS_PROPERTY]->(:Concept)
(:Concept)-[:OPPOSITE_OF]->(:Concept)
(:Document)-[:MENTIONS]->(:Concept)
(:Document)-[:MENTIONS]->(:Entity)
(:Document)-[:ABOUT]->(:Topic)
(:Entity)-[:WORKS_AT]->(:Entity)
(:Entity)-[:LOCATED_IN]->(:Entity)
```

### Recommendation Engine Schema

```cypher
// Nodes
(:User {id, age, location})
(:Product {id, name, category, price})
(:Category {id, name})
(:Brand {id, name})

// Relationships
(:User)-[:PURCHASED {date, rating, quantity}]->(:Product)
(:User)-[:VIEWED {timestamp, duration}]->(:Product)
(:User)-[:ADDED_TO_CART {timestamp}]->(:Product)
(:User)-[:SEARCHED_FOR {query, timestamp}]->(:Product)
(:Product)-[:IN_CATEGORY]->(:Category)
(:Product)-[:MADE_BY]->(:Brand)
(:Product)-[:SIMILAR_TO {score}]->(:Product)
(:Category)-[:PARENT_CATEGORY]->(:Category)
```

### Access Control Schema (ReBAC)

Relationship-Based Access Control.

```cypher
// Nodes
(:User {id, name})
(:Role {id, name})
(:Resource {id, name, type})
(:Permission {id, action})  // read, write, delete

// Relationships
(:User)-[:HAS_ROLE]->(:Role)
(:User)-[:MEMBER_OF]->(:Group)
(:Group)-[:HAS_ROLE]->(:Role)
(:Role)-[:HAS_PERMISSION]->(:Permission)
(:Permission)-[:ON_RESOURCE]->(:Resource)
(:User)-[:OWNS]->(:Resource)
(:Resource)-[:CHILD_OF]->(:Resource)  // Inheritance

// Query: Can user access resource?
MATCH path = shortestPath(
  (u:User {id: $userId})-[:HAS_ROLE|MEMBER_OF|OWNS*]-(r:Resource {id: $resourceId})
)
RETURN path IS NOT NULL AS hasAccess
```

## Modeling for Performance

### Strategy 1: Materialize Computed Values

Cache expensive calculations as properties.

```cypher
// Instead of computing every time:
MATCH (u:User)-[:FRIEND]->(f)
RETURN u.name, count(f) AS friend_count

// Materialize:
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS fc
SET u.friend_count = fc

// Query becomes instant:
MATCH (u:User)
RETURN u.name, u.friend_count
ORDER BY u.friend_count DESC
```

**Update strategy**: Batch update periodically or trigger on changes.

### Strategy 2: Shortcut Relationships

Create direct paths for frequent traversals.

```cypher
// Instead of: User -> Company -> Department -> Manager
(:User)-[:WORKS_AT]->(:Company)-[:HAS_DEPARTMENT]->(:Department)-[:MANAGED_BY]->(:Manager)

// Add shortcut:
(:User)-[:MANAGER]->(:Manager)

// Query is now single hop
MATCH (u:User {id: 'u123'})-[:MANAGER]->(m)
RETURN m.name
```

### Strategy 3: Denormalize Frequently Accessed Data

```cypher
// Original: Always traverse to get company name
(:User {name: 'Alice'})-[:WORKS_AT]->(:Company {name: 'Acme'})

// Denormalized: Store company name on user
(:User {name: 'Alice', company_name: 'Acme'})-[:WORKS_AT]->(:Company {name: 'Acme'})
```

**Trade-off**: Faster reads, but need to update duplicates when source changes.

## Indexing Strategy

### When to Index

Index properties that are:
1. Frequently filtered (WHERE clauses)
2. Used for lookups (MATCH with property)
3. Sorted (ORDER BY)

```cypher
// High-value indexes
CREATE INDEX user_email FOR (u:User) ON (u.email);
CREATE INDEX product_sku FOR (p:Product) ON (p.sku);
CREATE INDEX order_date FOR (o:Order) ON (o.date);
```

### Composite Indexes

For queries filtering on multiple properties.

```cypher
CREATE INDEX user_location FOR (u:User) ON (u.city, u.state);

// Benefits this query:
MATCH (u:User {city: 'San Francisco', state: 'CA'})
RETURN u
```

### Full-Text Indexes

For text search queries.

```cypher
CREATE FULLTEXT INDEX product_search FOR (p:Product) ON EACH [p.name, p.description];

CALL db.index.fulltext.queryNodes('product_search', 'laptop computer')
YIELD node, score
RETURN node.name, score
ORDER BY score DESC
```

## Migration Patterns

### From Relational to Graph

**Relational**:
```sql
users (id, name, email)
friendships (user_id, friend_id, since)
posts (id, user_id, content, created_at)
comments (id, post_id, user_id, text)
```

**Graph**:
```cypher
// Nodes (rows become nodes)
(:User {id, name, email})
(:Post {id, content, created_at})
(:Comment {id, text})

// Relationships (foreign keys become relationships)
(:User)-[:FRIEND {since}]->(:User)
(:User)-[:POSTED]->(:Post)
(:User)-[:COMMENTED]->(:Comment)
(:Comment)-[:ON_POST]->(:Post)
```

### Migration Steps

1. **Identify entities** → Nodes
2. **Identify relationships** → Edges
3. **Foreign keys** → Relationships
4. **Junction tables** → Either relationships with properties OR intermediate nodes
5. **Attributes** → Properties

```python
# Example migration script
from neo4j import GraphDatabase
import psycopg2

# Connect to PostgreSQL
pg_conn = psycopg2.connect("dbname=mydb user=postgres")
pg_cur = pg_conn.cursor()

# Connect to Neo4j
neo4j_driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

def migrate_users():
    pg_cur.execute("SELECT id, name, email FROM users")
    users = pg_cur.fetchall()

    with neo4j_driver.session() as session:
        for user_id, name, email in users:
            session.run(
                "CREATE (u:User {id: $id, name: $name, email: $email})",
                id=user_id, name=name, email=email
            )

def migrate_friendships():
    pg_cur.execute("SELECT user_id, friend_id, since FROM friendships")
    friendships = pg_cur.fetchall()

    with neo4j_driver.session() as session:
        for user_id, friend_id, since in friendships:
            session.run(
                """
                MATCH (u1:User {id: $user_id}), (u2:User {id: $friend_id})
                CREATE (u1)-[:FRIEND {since: $since}]->(u2)
                """,
                user_id=user_id, friend_id=friend_id, since=since
            )
```

## Schema Validation

### Constraints

```cypher
// Uniqueness
CREATE CONSTRAINT user_email_unique FOR (u:User) REQUIRE u.email IS UNIQUE;

// Existence (Enterprise only)
CREATE CONSTRAINT user_name_exists FOR (u:User) REQUIRE u.name IS NOT NULL;

// Node key (composite uniqueness)
CREATE CONSTRAINT user_key FOR (u:User) REQUIRE (u.id, u.email) IS NODE KEY;
```

### Validation Queries

Check for schema violations:

```cypher
// Find nodes without required properties
MATCH (u:User)
WHERE u.email IS NULL
RETURN u

// Find orphaned nodes (no relationships)
MATCH (n)
WHERE NOT (n)--()
RETURN labels(n), count(n)

// Find supernodes (too many relationships)
MATCH (n)
WITH n, size((n)--()) AS degree
WHERE degree > 10000
RETURN labels(n), n.id, degree
ORDER BY degree DESC
```

## Further Reading

- Neo4j Graph Data Modeling Guide: https://neo4j.com/developer/guide-data-modeling/
- Graph Databases (O'Reilly Book): https://neo4j.com/graph-databases-book/

```