databases
Provides guidance for working with MongoDB and PostgreSQL databases, including schema design, query writing, performance optimization, and administration tasks. Includes a performance analyzer tool that identifies slow queries and suggests index improvements for both database systems.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install mrgoonie-claudekit-skills-databases
Repository
Skill path: .claude/skills/databases
Provides guidance for working with MongoDB and PostgreSQL databases, including schema design, query writing, performance optimization, and administration tasks. Includes a performance analyzer tool that identifies slow queries and suggests index improvements for both database systems.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Backend, DevOps, Tech Writer, Designer.
Target audience: Developers and database administrators working with MongoDB or PostgreSQL who need guidance on database operations, performance optimization, and administration tasks..
License: MIT.
Original source
Catalog source: SkillHub Club.
Repository owner: mrgoonie.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install databases into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/mrgoonie/claudekit-skills before adding databases to shared team environments
- Use databases for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
--- name: databases description: Work with MongoDB (document database, BSON documents, aggregation pipelines, Atlas cloud) and PostgreSQL (relational database, SQL queries, psql CLI, pgAdmin). Use when designing database schemas, writing queries and aggregations, optimizing indexes for performance, performing database migrations, configuring replication and sharding, implementing backup and restore strategies, managing database users and permissions, analyzing query performance, or administering production databases. license: MIT --- # Databases Skill Unified guide for working with MongoDB (document-oriented) and PostgreSQL (relational) databases. Choose the right database for your use case and master both systems. ## When to Use This Skill Use when: - Designing database schemas and data models - Writing queries (SQL or MongoDB query language) - Building aggregation pipelines or complex joins - Optimizing indexes and query performance - Implementing database migrations - Setting up replication, sharding, or clustering - Configuring backups and disaster recovery - Managing database users and permissions - Analyzing slow queries and performance issues - Administering production database deployments ## Database Selection Guide ### Choose MongoDB When: - Schema flexibility: frequent structure changes, heterogeneous data - Document-centric: natural JSON/BSON data model - Horizontal scaling: need to shard across multiple servers - High write throughput: IoT, logging, real-time analytics - Nested/hierarchical data: embedded documents preferred - Rapid prototyping: schema evolution without migrations **Best for:** Content management, catalogs, IoT time series, real-time analytics, mobile apps, user profiles ### Choose PostgreSQL When: - Strong consistency: ACID transactions critical - Complex relationships: many-to-many joins, referential integrity - SQL requirement: team expertise, reporting tools, BI systems - Data integrity: strict schema validation, constraints - Mature ecosystem: extensive tooling, extensions - Complex queries: window functions, CTEs, analytical workloads **Best for:** Financial systems, e-commerce transactions, ERP, CRM, data warehousing, analytics ### Both Support: - JSON/JSONB storage and querying - Full-text search capabilities - Geospatial queries and indexing - Replication and high availability - ACID transactions (MongoDB 4.0+) - Strong security features ## Quick Start ### MongoDB Setup ```bash # Atlas (Cloud) - Recommended # 1. Sign up at mongodb.com/atlas # 2. Create M0 free cluster # 3. Get connection string # Connection mongodb+srv://user:[email protected]/db # Shell mongosh "mongodb+srv://cluster.mongodb.net/mydb" # Basic operations db.users.insertOne({ name: "Alice", age: 30 }) db.users.find({ age: { $gte: 18 } }) db.users.updateOne({ name: "Alice" }, { $set: { age: 31 } }) db.users.deleteOne({ name: "Alice" }) ``` ### PostgreSQL Setup ```bash # Ubuntu/Debian sudo apt-get install postgresql postgresql-contrib # Start service sudo systemctl start postgresql # Connect psql -U postgres -d mydb # Basic operations CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT, age INT); INSERT INTO users (name, age) VALUES ('Alice', 30); SELECT * FROM users WHERE age >= 18; UPDATE users SET age = 31 WHERE name = 'Alice'; DELETE FROM users WHERE name = 'Alice'; ``` ## Common Operations ### Create/Insert ```javascript // MongoDB db.users.insertOne({ name: "Bob", email: "[email protected]" }) db.users.insertMany([{ name: "Alice" }, { name: "Charlie" }]) ``` ```sql -- PostgreSQL INSERT INTO users (name, email) VALUES ('Bob', '[email protected]'); INSERT INTO users (name, email) VALUES ('Alice', NULL), ('Charlie', NULL); ``` ### Read/Query ```javascript // MongoDB db.users.find({ age: { $gte: 18 } }) db.users.findOne({ email: "[email protected]" }) ``` ```sql -- PostgreSQL SELECT * FROM users WHERE age >= 18; SELECT * FROM users WHERE email = '[email protected]' LIMIT 1; ``` ### Update ```javascript // MongoDB db.users.updateOne({ name: "Bob" }, { $set: { age: 25 } }) db.users.updateMany({ status: "pending" }, { $set: { status: "active" } }) ``` ```sql -- PostgreSQL UPDATE users SET age = 25 WHERE name = 'Bob'; UPDATE users SET status = 'active' WHERE status = 'pending'; ``` ### Delete ```javascript // MongoDB db.users.deleteOne({ name: "Bob" }) db.users.deleteMany({ status: "deleted" }) ``` ```sql -- PostgreSQL DELETE FROM users WHERE name = 'Bob'; DELETE FROM users WHERE status = 'deleted'; ``` ### Indexing ```javascript // MongoDB db.users.createIndex({ email: 1 }) db.users.createIndex({ status: 1, createdAt: -1 }) ``` ```sql -- PostgreSQL CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_users_status_created ON users(status, created_at DESC); ``` ## Reference Navigation ### MongoDB References - **[mongodb-crud.md](references/mongodb-crud.md)** - CRUD operations, query operators, atomic updates - **[mongodb-aggregation.md](references/mongodb-aggregation.md)** - Aggregation pipeline, stages, operators, patterns - **[mongodb-indexing.md](references/mongodb-indexing.md)** - Index types, compound indexes, performance optimization - **[mongodb-atlas.md](references/mongodb-atlas.md)** - Atlas cloud setup, clusters, monitoring, search ### PostgreSQL References - **[postgresql-queries.md](references/postgresql-queries.md)** - SELECT, JOINs, subqueries, CTEs, window functions - **[postgresql-psql-cli.md](references/postgresql-psql-cli.md)** - psql commands, meta-commands, scripting - **[postgresql-performance.md](references/postgresql-performance.md)** - EXPLAIN, query optimization, vacuum, indexes - **[postgresql-administration.md](references/postgresql-administration.md)** - User management, backups, replication, maintenance ## Python Utilities Database utility scripts in `scripts/`: - **db_migrate.py** - Generate and apply migrations for both databases - **db_backup.py** - Backup and restore MongoDB and PostgreSQL - **db_performance_check.py** - Analyze slow queries and recommend indexes ```bash # Generate migration python scripts/db_migrate.py --db mongodb --generate "add_user_index" # Run backup python scripts/db_backup.py --db postgres --output /backups/ # Check performance python scripts/db_performance_check.py --db mongodb --threshold 100ms ``` ## Key Differences Summary | Feature | MongoDB | PostgreSQL | |---------|---------|------------| | Data Model | Document (JSON/BSON) | Relational (Tables/Rows) | | Schema | Flexible, dynamic | Strict, predefined | | Query Language | MongoDB Query Language | SQL | | Joins | $lookup (limited) | Native, optimized | | Transactions | Multi-document (4.0+) | Native ACID | | Scaling | Horizontal (sharding) | Vertical (primary), Horizontal (extensions) | | Indexes | Single, compound, text, geo, etc | B-tree, hash, GiST, GIN, etc | ## Best Practices **MongoDB:** - Use embedded documents for 1-to-few relationships - Reference documents for 1-to-many or many-to-many - Index frequently queried fields - Use aggregation pipeline for complex transformations - Enable authentication and TLS in production - Use Atlas for managed hosting **PostgreSQL:** - Normalize schema to 3NF, denormalize for performance - Use foreign keys for referential integrity - Index foreign keys and frequently filtered columns - Use EXPLAIN ANALYZE to optimize queries - Regular VACUUM and ANALYZE maintenance - Connection pooling (pgBouncer) for web apps ## Resources - MongoDB: https://www.mongodb.com/docs/ - PostgreSQL: https://www.postgresql.org/docs/ - MongoDB University: https://learn.mongodb.com/ - PostgreSQL Tutorial: https://www.postgresqltutorial.com/ --- ## Referenced Files > The following files are referenced in this skill and included for context. ### references/mongodb-crud.md ```markdown # MongoDB CRUD Operations CRUD operations (Create, Read, Update, Delete) in MongoDB with query operators and atomic updates. ## Create Operations ### insertOne ```javascript // Insert single document db.users.insertOne({ name: "Alice", email: "[email protected]", age: 30, createdAt: new Date() }) // Returns: { acknowledged: true, insertedId: ObjectId("...") } ``` ### insertMany ```javascript // Insert multiple documents db.users.insertMany([ { name: "Bob", age: 25 }, { name: "Charlie", age: 35 }, { name: "Diana", age: 28 } ]) // With ordered: false (continue on error) db.users.insertMany(docs, { ordered: false }) ``` ## Read Operations ### find ```javascript // Find all documents db.users.find() // Find with filter db.users.find({ age: { $gte: 18 } }) // Projection (select fields) db.users.find({ status: "active" }, { name: 1, email: 1, _id: 0 }) // Cursor operations db.users.find() .sort({ createdAt: -1 }) .limit(10) .skip(20) ``` ### findOne ```javascript // Get single document db.users.findOne({ email: "[email protected]" }) // With projection db.users.findOne({ _id: ObjectId("...") }, { name: 1, email: 1 }) ``` ### count/estimatedDocumentCount ```javascript // Count matching documents db.users.countDocuments({ status: "active" }) // Fast estimate (uses metadata) db.users.estimatedDocumentCount() ``` ### distinct ```javascript // Get unique values db.users.distinct("status") db.users.distinct("city", { country: "USA" }) ``` ## Update Operations ### updateOne ```javascript // Update first matching document db.users.updateOne( { email: "[email protected]" }, { $set: { status: "verified" } } ) // Upsert (insert if not exists) db.users.updateOne( { email: "[email protected]" }, { $set: { name: "New User" } }, { upsert: true } ) ``` ### updateMany ```javascript // Update all matching documents db.users.updateMany( { lastLogin: { $lt: cutoffDate } }, { $set: { status: "inactive" } } ) // Multiple updates db.users.updateMany( { status: "pending" }, { $set: { status: "active" }, $currentDate: { updatedAt: true } } ) ``` ### replaceOne ```javascript // Replace entire document (except _id) db.users.replaceOne( { _id: ObjectId("...") }, { name: "Alice", email: "[email protected]", age: 31 } ) ``` ## Delete Operations ### deleteOne ```javascript // Delete first matching document db.users.deleteOne({ email: "[email protected]" }) ``` ### deleteMany ```javascript // Delete all matching documents db.users.deleteMany({ status: "deleted" }) // Delete all documents in collection db.users.deleteMany({}) ``` ## Query Operators ### Comparison Operators ```javascript // $eq (equals) db.users.find({ age: { $eq: 30 } }) db.users.find({ age: 30 }) // Implicit $eq // $ne (not equals) db.users.find({ status: { $ne: "deleted" } }) // $gt, $gte, $lt, $lte db.users.find({ age: { $gt: 18, $lte: 65 } }) // $in (in array) db.users.find({ status: { $in: ["active", "pending"] } }) // $nin (not in array) db.users.find({ status: { $nin: ["deleted", "banned"] } }) ``` ### Logical Operators ```javascript // $and (implicit for multiple conditions) db.users.find({ age: { $gte: 18 }, status: "active" }) // $and (explicit) db.users.find({ $and: [ { age: { $gte: 18 } }, { status: "active" } ] }) // $or db.users.find({ $or: [ { status: "active" }, { verified: true } ] }) // $not db.users.find({ age: { $not: { $lt: 18 } } }) // $nor (not any condition) db.users.find({ $nor: [ { status: "deleted" }, { status: "banned" } ] }) ``` ### Element Operators ```javascript // $exists db.users.find({ phoneNumber: { $exists: true } }) db.users.find({ deletedAt: { $exists: false } }) // $type db.users.find({ age: { $type: "int" } }) db.users.find({ age: { $type: ["int", "double"] } }) ``` ### Array Operators ```javascript // $all (contains all elements) db.posts.find({ tags: { $all: ["mongodb", "database"] } }) // $elemMatch (array element matches all conditions) db.products.find({ reviews: { $elemMatch: { rating: { $gte: 4 }, verified: true } } }) // $size (array length) db.posts.find({ tags: { $size: 3 } }) ``` ### String Operators ```javascript // $regex (regular expression) db.users.find({ name: { $regex: /^A/i } }) db.users.find({ email: { $regex: "@example\\.com$" } }) // Text search (requires text index) db.articles.find({ $text: { $search: "mongodb database" } }) ``` ## Update Operators ### Field Update Operators ```javascript // $set (set field value) db.users.updateOne( { _id: userId }, { $set: { status: "active", updatedAt: new Date() } } ) // $unset (remove field) db.users.updateOne( { _id: userId }, { $unset: { tempField: "" } } ) // $rename (rename field) db.users.updateMany( {}, { $rename: { "oldName": "newName" } } ) // $currentDate (set to current date) db.users.updateOne( { _id: userId }, { $currentDate: { lastModified: true } } ) ``` ### Numeric Update Operators ```javascript // $inc (increment) db.posts.updateOne( { _id: postId }, { $inc: { views: 1, likes: 5 } } ) // $mul (multiply) db.products.updateOne( { _id: productId }, { $mul: { price: 1.1 } } // 10% increase ) // $min (update if new value is less) db.scores.updateOne( { _id: scoreId }, { $min: { lowestScore: 50 } } ) // $max (update if new value is greater) db.scores.updateOne( { _id: scoreId }, { $max: { highestScore: 100 } } ) ``` ### Array Update Operators ```javascript // $push (add to array) db.posts.updateOne( { _id: postId }, { $push: { comments: { author: "Alice", text: "Great!" } } } ) // $push with $each (multiple elements) db.posts.updateOne( { _id: postId }, { $push: { tags: { $each: ["mongodb", "database"] } } } ) // $addToSet (add if not exists) db.users.updateOne( { _id: userId }, { $addToSet: { interests: "coding" } } ) // $pull (remove matching elements) db.users.updateOne( { _id: userId }, { $pull: { tags: "deprecated" } } ) // $pop (remove first/last element) db.users.updateOne( { _id: userId }, { $pop: { notifications: -1 } } // -1: first, 1: last ) // $ (update first matching array element) db.posts.updateOne( { _id: postId, "comments.author": "Alice" }, { $set: { "comments.$.text": "Updated comment" } } ) // $[] (update all array elements) db.posts.updateOne( { _id: postId }, { $set: { "comments.$[].verified": true } } ) // $[<identifier>] (filtered positional) db.posts.updateOne( { _id: postId }, { $set: { "comments.$[elem].flagged": true } }, { arrayFilters: [{ "elem.rating": { $lt: 2 } }] } ) ``` ## Atomic Operations ### findAndModify / findOneAndUpdate ```javascript // Find and update (returns old document by default) db.users.findOneAndUpdate( { email: "[email protected]" }, { $set: { status: "active" } } ) // Return new document db.users.findOneAndUpdate( { email: "[email protected]" }, { $set: { status: "active" } }, { returnNewDocument: true } ) // Upsert and return new db.counters.findOneAndUpdate( { _id: "sequence" }, { $inc: { value: 1 } }, { upsert: true, returnNewDocument: true } ) ``` ### findOneAndReplace ```javascript // Find and replace entire document db.users.findOneAndReplace( { _id: ObjectId("...") }, { name: "Alice", email: "[email protected]" }, { returnNewDocument: true } ) ``` ### findOneAndDelete ```javascript // Find and delete (returns deleted document) const deletedUser = db.users.findOneAndDelete( { email: "[email protected]" } ) ``` ## Bulk Operations ```javascript // Ordered bulk write (stops on first error) db.users.bulkWrite([ { insertOne: { document: { name: "Alice" } } }, { updateOne: { filter: { name: "Bob" }, update: { $set: { age: 25 } } }}, { deleteOne: { filter: { name: "Charlie" } } } ]) // Unordered (continues on errors) db.users.bulkWrite(operations, { ordered: false }) ``` ## Best Practices 1. **Use projection** to return only needed fields 2. **Create indexes** on frequently queried fields 3. **Use updateMany** carefully (can affect many documents) 4. **Use upsert** for "create or update" patterns 5. **Use atomic operators** ($inc, $push) for concurrent updates 6. **Avoid large arrays** in documents (embed vs reference) 7. **Use findAndModify** for atomic read-modify-write 8. **Batch operations** with insertMany/bulkWrite for efficiency ``` ### references/mongodb-aggregation.md ```markdown # MongoDB Aggregation Pipeline Aggregation pipeline for complex data transformations, analytics, and multi-stage processing. ## Pipeline Concept Aggregation processes documents through multiple stages. Each stage transforms documents and passes results to next stage. ```javascript db.collection.aggregate([ { /* Stage 1 */ }, { /* Stage 2 */ }, { /* Stage 3 */ } ]) ``` ## Core Pipeline Stages ### $match (Filter Documents) ```javascript // Filter early in pipeline for efficiency db.orders.aggregate([ { $match: { status: "completed", total: { $gte: 100 } } }, // Subsequent stages process only matched documents ]) // Multiple conditions db.orders.aggregate([ { $match: { $and: [ { orderDate: { $gte: startDate } }, { status: { $in: ["completed", "shipped"] } } ] }} ]) ``` ### $project (Reshape Documents) ```javascript // Select and reshape fields db.orders.aggregate([ { $project: { orderNumber: 1, total: 1, customerName: "$customer.name", year: { $year: "$orderDate" }, _id: 0 // Exclude _id }} ]) // Computed fields db.orders.aggregate([ { $project: { total: 1, tax: { $multiply: ["$total", 0.1] }, grandTotal: { $add: ["$total", { $multiply: ["$total", 0.1] }] } }} ]) ``` ### $group (Aggregate Data) ```javascript // Group and count db.orders.aggregate([ { $group: { _id: "$status", count: { $sum: 1 } }} ]) // Multiple aggregations db.orders.aggregate([ { $group: { _id: "$customerId", totalSpent: { $sum: "$total" }, orderCount: { $sum: 1 }, avgOrderValue: { $avg: "$total" }, maxOrder: { $max: "$total" }, minOrder: { $min: "$total" } }} ]) // Group by multiple fields db.sales.aggregate([ { $group: { _id: { year: { $year: "$date" }, month: { $month: "$date" }, product: "$productId" }, revenue: { $sum: "$amount" } }} ]) ``` ### $sort (Order Results) ```javascript // Sort by field db.orders.aggregate([ { $sort: { total: -1 } } // -1: descending, 1: ascending ]) // Sort by multiple fields db.orders.aggregate([ { $sort: { status: 1, orderDate: -1 } } ]) ``` ### $limit / $skip (Pagination) ```javascript // Limit results db.orders.aggregate([ { $sort: { orderDate: -1 } }, { $limit: 10 } ]) // Pagination const page = 2; const pageSize = 20; db.orders.aggregate([ { $sort: { orderDate: -1 } }, { $skip: (page - 1) * pageSize }, { $limit: pageSize } ]) ``` ### $lookup (Join Collections) ```javascript // Simple join db.orders.aggregate([ { $lookup: { from: "customers", localField: "customerId", foreignField: "_id", as: "customer" }}, { $unwind: "$customer" } // Convert array to object ]) // Pipeline join (more powerful) db.orders.aggregate([ { $lookup: { from: "products", let: { items: "$items" }, pipeline: [ { $match: { $expr: { $in: ["$_id", "$$items.productId"] } } }, { $project: { name: 1, price: 1 } } ], as: "productDetails" }} ]) ``` ### $unwind (Deconstruct Arrays) ```javascript // Unwind array field db.orders.aggregate([ { $unwind: "$items" } ]) // Preserve null/empty arrays db.orders.aggregate([ { $unwind: { path: "$items", preserveNullAndEmptyArrays: true }} ]) // Include array index db.orders.aggregate([ { $unwind: { path: "$items", includeArrayIndex: "itemIndex" }} ]) ``` ### $addFields (Add New Fields) ```javascript // Add computed fields db.orders.aggregate([ { $addFields: { totalWithTax: { $multiply: ["$total", 1.1] }, year: { $year: "$orderDate" } }} ]) ``` ### $replaceRoot (Replace Document Root) ```javascript // Promote subdocument to root db.orders.aggregate([ { $replaceRoot: { newRoot: "$customer" } } ]) // Merge fields db.orders.aggregate([ { $replaceRoot: { newRoot: { $mergeObjects: ["$customer", { orderId: "$_id" }] } }} ]) ``` ## Aggregation Operators ### Arithmetic Operators ```javascript // Basic math db.products.aggregate([ { $project: { name: 1, profit: { $subtract: ["$price", "$cost"] }, margin: { $multiply: [ { $divide: [ { $subtract: ["$price", "$cost"] }, "$price" ]}, 100 ]} }} ]) // Other operators: $add, $multiply, $divide, $mod, $abs, $ceil, $floor, $round ``` ### String Operators ```javascript // String manipulation db.users.aggregate([ { $project: { fullName: { $concat: ["$firstName", " ", "$lastName"] }, email: { $toLower: "$email" }, initials: { $concat: [ { $substr: ["$firstName", 0, 1] }, { $substr: ["$lastName", 0, 1] } ]} }} ]) // Other: $toUpper, $trim, $split, $substr, $regexMatch ``` ### Date Operators ```javascript // Date extraction db.events.aggregate([ { $project: { event: 1, year: { $year: "$timestamp" }, month: { $month: "$timestamp" }, day: { $dayOfMonth: "$timestamp" }, hour: { $hour: "$timestamp" }, dayOfWeek: { $dayOfWeek: "$timestamp" } }} ]) // Date math db.events.aggregate([ { $project: { event: 1, expiresAt: { $add: ["$createdAt", 1000 * 60 * 60 * 24 * 30] }, // +30 days ageInDays: { $divide: [ { $subtract: [new Date(), "$createdAt"] }, 1000 * 60 * 60 * 24 ]} }} ]) ``` ### Array Operators ```javascript // Array operations db.posts.aggregate([ { $project: { title: 1, tagCount: { $size: "$tags" }, firstTag: { $arrayElemAt: ["$tags", 0] }, lastTag: { $arrayElemAt: ["$tags", -1] }, hasMongoDBTag: { $in: ["mongodb", "$tags"] } }} ]) // Array filtering db.posts.aggregate([ { $project: { title: 1, activeTags: { $filter: { input: "$tags", as: "tag", cond: { $ne: ["$$tag.status", "deprecated"] } } } }} ]) ``` ### Conditional Operators ```javascript // $cond (ternary) db.products.aggregate([ { $project: { name: 1, status: { $cond: { if: { $gte: ["$stock", 10] }, then: "In Stock", else: "Low Stock" } } }} ]) // $switch (multiple conditions) db.orders.aggregate([ { $project: { status: 1, priority: { $switch: { branches: [ { case: { $gte: ["$total", 1000] }, then: "High" }, { case: { $gte: ["$total", 100] }, then: "Medium" } ], default: "Low" } } }} ]) ``` ## Advanced Patterns ### Time-Based Aggregation ```javascript // Daily sales db.orders.aggregate([ { $match: { orderDate: { $gte: startDate } } }, { $group: { _id: { year: { $year: "$orderDate" }, month: { $month: "$orderDate" }, day: { $dayOfMonth: "$orderDate" } }, revenue: { $sum: "$total" }, orderCount: { $sum: 1 } }}, { $sort: { "_id.year": 1, "_id.month": 1, "_id.day": 1 } } ]) ``` ### Faceted Search ```javascript // Multiple aggregations in one query db.products.aggregate([ { $match: { category: "electronics" } }, { $facet: { priceRanges: [ { $bucket: { groupBy: "$price", boundaries: [0, 100, 500, 1000, 5000], default: "5000+", output: { count: { $sum: 1 } } }} ], topBrands: [ { $group: { _id: "$brand", count: { $sum: 1 } } }, { $sort: { count: -1 } }, { $limit: 5 } ], avgPrice: [ { $group: { _id: null, avg: { $avg: "$price" } } } ] }} ]) ``` ### Window Functions ```javascript // Running totals and moving averages db.sales.aggregate([ { $setWindowFields: { partitionBy: "$region", sortBy: { date: 1 }, output: { runningTotal: { $sum: "$amount", window: { documents: ["unbounded", "current"] } }, movingAvg: { $avg: "$amount", window: { documents: [-7, 0] } // Last 7 days } } }} ]) ``` ### Text Search with Aggregation ```javascript // Full-text search (requires text index) db.articles.aggregate([ { $match: { $text: { $search: "mongodb database" } } }, { $addFields: { score: { $meta: "textScore" } } }, { $sort: { score: -1 } }, { $limit: 10 } ]) ``` ### Geospatial Aggregation ```javascript // Find nearby locations db.places.aggregate([ { $geoNear: { near: { type: "Point", coordinates: [lon, lat] }, distanceField: "distance", maxDistance: 5000, spherical: true }}, { $limit: 10 } ]) ``` ## Performance Tips 1. **$match early** - Filter documents before other stages 2. **$project early** - Reduce document size 3. **Index usage** - $match and $sort can use indexes (only at start) 4. **$limit after $sort** - Reduce memory usage 5. **Avoid $lookup** - Prefer embedded documents when possible 6. **Use $facet sparingly** - Can be memory intensive 7. **allowDiskUse** - Enable for large datasets ```javascript db.collection.aggregate(pipeline, { allowDiskUse: true }) ``` ## Best Practices 1. **Order stages efficiently** - $match → $project → $group → $sort → $limit 2. **Use $expr carefully** - Can prevent index usage 3. **Monitor memory** - Default limit: 100MB per stage 4. **Test with explain** - Analyze pipeline performance ```javascript db.collection.explain("executionStats").aggregate(pipeline) ``` 5. **Break complex pipelines** - Use $out/$merge for intermediate results 6. **Use $sample** - For random document selection 7. **Leverage $addFields** - Cleaner than $project for adding fields ``` ### references/mongodb-indexing.md ```markdown # MongoDB Indexing and Performance Index types, strategies, and performance optimization techniques for MongoDB. ## Index Fundamentals Indexes improve query performance by allowing MongoDB to scan fewer documents. Without indexes, MongoDB performs collection scans (reads every document). ```javascript // Check if query uses index db.users.find({ email: "[email protected]" }).explain("executionStats") // Key metrics: // - executionTimeMillis: query duration // - totalDocsExamined: documents scanned // - nReturned: documents returned // - stage: IXSCAN (index) vs COLLSCAN (full scan) ``` ## Index Types ### Single Field Index ```javascript // Create index on single field db.users.createIndex({ email: 1 }) // 1: ascending, -1: descending // Use case: queries filtering by email db.users.find({ email: "[email protected]" }) // Drop index db.users.dropIndex({ email: 1 }) db.users.dropIndex("email_1") // By name ``` ### Compound Index ```javascript // Index on multiple fields (order matters!) db.orders.createIndex({ status: 1, createdAt: -1 }) // Supports queries on: // 1. { status: "..." } // 2. { status: "...", createdAt: ... } // Does NOT efficiently support: { createdAt: ... } alone // Left-to-right prefix rule db.orders.createIndex({ a: 1, b: 1, c: 1 }) // Supports: {a}, {a,b}, {a,b,c} // Not: {b}, {c}, {b,c} ``` ### Text Index (Full-Text Search) ```javascript // Create text index db.articles.createIndex({ title: "text", body: "text" }) // Only one text index per collection db.articles.createIndex({ title: "text", body: "text", tags: "text" }, { weights: { title: 10, // Title matches weighted higher body: 5, tags: 3 } }) // Search db.articles.find({ $text: { $search: "mongodb database" } }) // Search with score db.articles.find( { $text: { $search: "mongodb" } }, { score: { $meta: "textScore" } } ).sort({ score: { $meta: "textScore" } }) ``` ### Geospatial Indexes ```javascript // 2dsphere index (spherical geometry) db.places.createIndex({ location: "2dsphere" }) // Document format db.places.insertOne({ name: "Coffee Shop", location: { type: "Point", coordinates: [-73.97, 40.77] // [longitude, latitude] } }) // Find nearby db.places.find({ location: { $near: { $geometry: { type: "Point", coordinates: [-73.97, 40.77] }, $maxDistance: 5000 // meters } } }) // Within polygon db.places.find({ location: { $geoWithin: { $geometry: { type: "Polygon", coordinates: [[ [lon1, lat1], [lon2, lat2], [lon3, lat3], [lon1, lat1] ]] } } } }) ``` ### Wildcard Index ```javascript // Index all fields in subdocuments db.products.createIndex({ "attributes.$**": 1 }) // Supports queries on any nested field db.products.find({ "attributes.color": "red" }) db.products.find({ "attributes.size": "large" }) // Specific paths only db.products.createIndex( { "$**": 1 }, { wildcardProjection: { "attributes.color": 1, "attributes.size": 1 } } ) ``` ### Hashed Index ```javascript // Hashed index (for even distribution in sharding) db.users.createIndex({ userId: "hashed" }) // Use case: shard key sh.shardCollection("mydb.users", { userId: "hashed" }) ``` ### TTL Index (Auto-Expiration) ```javascript // Delete documents after specified time db.sessions.createIndex( { createdAt: 1 }, { expireAfterSeconds: 3600 } // 1 hour ) // Documents automatically deleted after createdAt + 3600 seconds // Background task runs every 60 seconds ``` ### Partial Index ```javascript // Index only documents matching filter db.orders.createIndex( { customerId: 1 }, { partialFilterExpression: { status: "active" } } ) // Index only used when query includes filter db.orders.find({ customerId: "123", status: "active" }) // Uses index db.orders.find({ customerId: "123" }) // Does not use index ``` ### Unique Index ```javascript // Enforce uniqueness db.users.createIndex({ email: 1 }, { unique: true }) // Compound unique index db.users.createIndex({ firstName: 1, lastName: 1 }, { unique: true }) // Sparse unique index (null values not indexed) db.users.createIndex({ email: 1 }, { unique: true, sparse: true }) ``` ### Sparse Index ```javascript // Index only documents with field present db.users.createIndex({ phoneNumber: 1 }, { sparse: true }) // Useful for optional fields // Documents without phoneNumber not in index ``` ## Index Management ### List Indexes ```javascript // Show all indexes db.collection.getIndexes() // Index statistics db.collection.aggregate([{ $indexStats: {} }]) ``` ### Create Index Options ```javascript // Background index (doesn't block operations) db.collection.createIndex({ field: 1 }, { background: true }) // Index name db.collection.createIndex({ field: 1 }, { name: "custom_index_name" }) // Case-insensitive index (collation) db.collection.createIndex( { name: 1 }, { collation: { locale: "en", strength: 2 } } ) ``` ### Hide/Unhide Index ```javascript // Hide index (test before dropping) db.collection.hideIndex("index_name") // Check performance without index // ... // Unhide or drop db.collection.unhideIndex("index_name") db.collection.dropIndex("index_name") ``` ### Rebuild Indexes ```javascript // Rebuild all indexes (after data changes) db.collection.reIndex() // Useful after bulk deletions to reclaim space ``` ## Query Optimization ### Covered Queries ```javascript // Query covered by index (no document fetch) db.users.createIndex({ email: 1, name: 1 }) // Covered query (all fields in index) db.users.find( { email: "[email protected]" }, { email: 1, name: 1, _id: 0 } // Must exclude _id ) // Check with explain: stage should be "IXSCAN" with no "FETCH" ``` ### Index Intersection ```javascript // MongoDB can use multiple indexes db.collection.createIndex({ a: 1 }) db.collection.createIndex({ b: 1 }) // Query may use both indexes db.collection.find({ a: 1, b: 1 }) // Usually compound index is better db.collection.createIndex({ a: 1, b: 1 }) ``` ### Index Hints ```javascript // Force specific index db.orders.find({ status: "active", city: "NYC" }) .hint({ status: 1, createdAt: -1 }) // Force no index (for testing) db.orders.find({ status: "active" }).hint({ $natural: 1 }) ``` ### ESR Rule (Equality, Sort, Range) ```javascript // Optimal compound index order: Equality → Sort → Range // Query db.orders.find({ status: "completed", // Equality category: "electronics" // Equality }).sort({ orderDate: -1 // Sort }).limit(10) // Optimal index db.orders.createIndex({ status: 1, // Equality first category: 1, // Equality orderDate: -1 // Sort last }) // With range db.orders.find({ status: "completed", // Equality total: { $gte: 100 } // Range }).sort({ orderDate: -1 // Sort }) // Optimal index db.orders.createIndex({ status: 1, // Equality orderDate: -1, // Sort total: 1 // Range last }) ``` ## Performance Analysis ### explain() Modes ```javascript // Query planner (default) db.collection.find({ field: value }).explain() // Execution stats db.collection.find({ field: value }).explain("executionStats") // All execution stats db.collection.find({ field: value }).explain("allPlansExecution") ``` ### Key Metrics ```javascript // Good performance indicators: // - executionTimeMillis < 100ms // - totalDocsExamined ≈ nReturned (examine only what's needed) // - stage: "IXSCAN" (using index) // - totalKeysExamined ≈ nReturned (index selectivity) // Bad indicators: // - stage: "COLLSCAN" (full collection scan) // - totalDocsExamined >> nReturned (scanning too many docs) // - executionTimeMillis > 1000ms ``` ### Index Selectivity ```javascript // High selectivity = good (returns few documents) // Low selectivity = bad (returns many documents) // Check selectivity db.collection.aggregate([ { $group: { _id: "$status", count: { $sum: 1 } } } ]) // Good for indexing: email, userId, orderId // Bad for indexing: gender, status (few unique values) ``` ## Index Strategies ### Multi-Tenant Applications ```javascript // Always filter by tenant first db.data.createIndex({ tenantId: 1, createdAt: -1 }) // All queries include tenantId db.data.find({ tenantId: "tenant1", createdAt: { $gte: date } }) ``` ### Time-Series Data ```javascript // Index on timestamp descending (recent data accessed more) db.events.createIndex({ timestamp: -1 }) // Compound with filter fields db.events.createIndex({ userId: 1, timestamp: -1 }) ``` ### Lookup Optimization ```javascript // Index foreign key fields db.orders.createIndex({ customerId: 1 }) db.customers.createIndex({ _id: 1 }) // Default _id index // Aggregation $lookup uses these indexes ``` ## Best Practices 1. **Create indexes for frequent queries** - Analyze slow query logs 2. **Limit number of indexes** - Each index adds write overhead 3. **Use compound indexes** - More efficient than multiple single indexes 4. **Follow ESR rule** - Equality, Sort, Range order 5. **Use covered queries** - When possible, avoid document fetches 6. **Monitor index usage** - Drop unused indexes ```javascript db.collection.aggregate([{ $indexStats: {} }]) ``` 7. **Partial indexes for filtered queries** - Reduce index size 8. **Consider index size** - Should fit in RAM ```javascript db.collection.stats().indexSizes ``` 9. **Background index creation** - Don't block operations (deprecated in 4.2+) 10. **Test with explain** - Verify query plan before production ## Common Pitfalls 1. **Over-indexing** - Too many indexes slow writes 2. **Unused indexes** - Waste space and write performance 3. **Regex without prefix** - `/pattern/` can't use index, `/^pattern/` can 4. **$ne, $nin queries** - Often scan entire collection 5. **$or with multiple branches** - May not use indexes efficiently 6. **Sort without index** - In-memory sort limited to 32MB 7. **Compound index order** - Wrong order makes index useless 8. **Case-sensitive queries** - Use collation for case-insensitive ## Monitoring ```javascript // Current operations db.currentOp() // Slow queries (enable profiling) db.setProfilingLevel(1, { slowms: 100 }) db.system.profile.find().sort({ ts: -1 }).limit(10) // Index statistics db.collection.aggregate([ { $indexStats: {} }, { $sort: { "accesses.ops": -1 } } ]) // Collection statistics db.collection.stats() ``` ## Index Size Calculation ```javascript // Check index sizes db.collection.stats().indexSizes // Total index size db.collection.totalIndexSize() // Recommend: indexes fit in RAM // Monitor: db.serverStatus().mem ``` ``` ### references/mongodb-atlas.md ```markdown # MongoDB Atlas Cloud Platform MongoDB Atlas is fully-managed cloud database service with automated backups, monitoring, and scaling. ## Quick Start ### Create Free Cluster 1. Sign up at mongodb.com/atlas 2. Create organization and project 3. Build cluster (M0 Free Tier) - Cloud provider: AWS/GCP/Azure - Region: closest to users - Cluster name 4. Create database user (username/password) 5. Whitelist IP address (or 0.0.0.0/0 for development) 6. Get connection string ### Connection String Format ``` mongodb+srv://username:[email protected]/database?retryWrites=true&w=majority ``` ### Connect ```javascript // Node.js const { MongoClient } = require("mongodb"); const uri = "mongodb+srv://..."; const client = new MongoClient(uri); await client.connect(); const db = client.db("myDatabase"); ``` ```python # Python from pymongo import MongoClient uri = "mongodb+srv://..." client = MongoClient(uri) db = client.myDatabase ``` ## Cluster Tiers ### M0 (Free Tier) - 512 MB storage - Shared CPU/RAM - Perfect for development/learning - Limited to 100 connections - No backups ### M10+ (Dedicated Clusters) - Dedicated resources - 2GB - 4TB+ storage - Automated backups - Advanced monitoring - Performance Advisor - Multi-region support - VPC peering ### Serverless - Pay per operation - Auto-scales to zero - Good for sporadic workloads - 1GB+ storage - Limited features (no full-text search) ## Database Configuration ### Create Database ```javascript // Via Atlas UI: Database → Add Database // Via shell use myNewDatabase db.createCollection("myCollection") // Via driver const db = client.db("myNewDatabase"); await db.createCollection("myCollection"); ``` ### Schema Validation ```javascript // Set validation rules in Atlas UI or via shell db.createCollection("users", { validator: { $jsonSchema: { bsonType: "object", required: ["email", "name"], properties: { email: { bsonType: "string", pattern: "^.+@.+$" }, age: { bsonType: "int", minimum: 0 } } } } }) ``` ## Security ### Network Access ```javascript // IP Whitelist (Atlas UI → Network Access) // - Add IP Address: specific IPs // - 0.0.0.0/0: allow from anywhere (dev only) // - VPC Peering: private connection // Connection string includes options mongodb+srv://cluster.mongodb.net/?retryWrites=true&w=majority&ssl=true ``` ### Database Users ```javascript // Create via Atlas UI → Database Access // - Username/password authentication // - AWS IAM authentication // - X.509 certificates // Roles: // - atlasAdmin: full access // - readWriteAnyDatabase: read/write all databases // - readAnyDatabase: read-only all databases // - read/readWrite: database-specific ``` ### Encryption ```javascript // Encryption at rest (automatic on M10+) // Encryption in transit (TLS/SSL, always enabled) // Client-Side Field Level Encryption (CSFLE) const autoEncryptionOpts = { keyVaultNamespace: "encryption.__keyVault", kmsProviders: { aws: { accessKeyId: process.env.AWS_ACCESS_KEY_ID, secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY } } }; const client = new MongoClient(uri, { autoEncryption: autoEncryptionOpts }); ``` ## Backups and Snapshots ### Cloud Backups (M10+) ```javascript // Automatic continuous backups // - Snapshots every 6-24 hours // - Oplog for point-in-time recovery // - Retention: 2+ days configurable // Restore via Atlas UI: // 1. Clusters → cluster name → Backup tab // 2. Select snapshot or point in time // 3. Download or restore to cluster ``` ### Manual Backups ```bash # Export using mongodump mongodump --uri="mongodb+srv://user:[email protected]/mydb" --out=/backup # Restore using mongorestore mongorestore --uri="mongodb+srv://..." /backup/mydb ``` ## Monitoring and Alerts ### Metrics Dashboard ```javascript // Atlas UI → Metrics // Key metrics: // - Operations per second // - Query execution times // - Connections // - Network I/O // - Disk usage // - CPU utilization // Real-time Performance panel // - Current operations // - Slow queries // - Index suggestions ``` ### Alerts ```javascript // Configure via Atlas UI → Alerts // Alert types: // - High connections (> threshold) // - High CPU usage (> 80%) // - Disk usage (> 90%) // - Replication lag // - Backup failures // Notification channels: // - Email // - SMS // - Slack // - PagerDuty // - Webhook ``` ### Performance Advisor ```javascript // Automatic index recommendations // Atlas UI → Performance Advisor // Analyzes: // - Slow queries // - Missing indexes // - Redundant indexes // - Index usage statistics // Provides: // - Index creation commands // - Expected performance improvement // - Schema design suggestions ``` ## Atlas Search (Full-Text Search) ### Create Search Index ```javascript // Atlas UI → Search → Create Index // JSON definition { "mappings": { "dynamic": false, "fields": { "title": { "type": "string", "analyzer": "lucene.standard" }, "description": { "type": "string", "analyzer": "lucene.english" }, "tags": { "type": "string" } } } } ``` ### Search Queries ```javascript // Aggregation pipeline with $search db.articles.aggregate([ { $search: { text: { query: "mongodb database tutorial", path: ["title", "description"], fuzzy: { maxEdits: 1 } } } }, { $limit: 10 }, { $project: { title: 1, description: 1, score: { $meta: "searchScore" } } } ]) // Autocomplete db.articles.aggregate([ { $search: { autocomplete: { query: "mong", path: "title", tokenOrder: "sequential" } } } ]) ``` ## Atlas Vector Search (AI/ML) ### Create Vector Search Index ```javascript // For AI similarity search (embeddings) { "fields": [ { "type": "vector", "path": "embedding", "numDimensions": 1536, // OpenAI embeddings "similarity": "cosine" } ] } ``` ### Vector Search Query ```javascript // Search by similarity db.products.aggregate([ { $vectorSearch: { index: "vector_index", path: "embedding", queryVector: [0.123, 0.456, ...], // 1536 dimensions numCandidates: 100, limit: 10 } }, { $project: { name: 1, description: 1, score: { $meta: "vectorSearchScore" } } } ]) ``` ## Data Federation ### Query Across Sources ```javascript // Federated database instance // Query data from: // - Atlas clusters // - AWS S3 // - HTTP endpoints // Create virtual collection { "databases": [{ "name": "federated", "collections": [{ "name": "sales", "dataSources": [{ "storeName": "s3Store", "path": "/sales/*.json" }] }] }] } // Query like normal collection use federated db.sales.find({ region: "US" }) ``` ## Atlas Charts (Embedded Analytics) ### Create Dashboard ```javascript // Atlas UI → Charts → New Dashboard // Data source: Atlas cluster // Chart types: bar, line, pie, scatter, etc. // Embed in application <iframe src="https://charts.mongodb.com/charts-project/embed/charts?id=..." width="800" height="600" /> ``` ## Atlas CLI ```bash # Install npm install -g mongodb-atlas-cli # Login atlas auth login # List clusters atlas clusters list # Create cluster atlas clusters create myCluster --provider AWS --region US_EAST_1 --tier M10 # Manage users atlas dbusers create --username myuser --password mypass # Backups atlas backups snapshots list --clusterName myCluster ``` ## Best Practices 1. **Use connection pooling** - Reuse connections ```javascript const client = new MongoClient(uri, { maxPoolSize: 50, minPoolSize: 10 }); ``` 2. **Enable authentication** - Always use database users, not Atlas users 3. **Restrict network access** - IP whitelist or VPC peering 4. **Monitor regularly** - Set up alerts for key metrics 5. **Index optimization** - Use Performance Advisor recommendations 6. **Backup verification** - Regularly test restores 7. **Right-size clusters** - Start small, scale as needed 8. **Multi-region** - For global applications (M10+) 9. **Read preferences** - Use secondaries for read-heavy workloads ```javascript const client = new MongoClient(uri, { readPreference: "secondaryPreferred" }); ``` 10. **Connection string security** - Use environment variables ```javascript const uri = process.env.MONGODB_URI; ``` ## Troubleshooting ### Connection Issues ```javascript // Check IP whitelist // Verify credentials // Test connection string // Verbose logging const client = new MongoClient(uri, { serverSelectionTimeoutMS: 5000, loggerLevel: "debug" }); ``` ### Performance Issues ```javascript // Check Performance Advisor // Review slow query logs // Analyze index usage db.collection.aggregate([{ $indexStats: {} }]) // Check connection count db.serverStatus().connections ``` ### Common Errors ```javascript // MongoNetworkError: IP not whitelisted // → Add IP to Network Access // Authentication failed: wrong credentials // → Verify username/password in Database Access // Timeout: connection string or network issue // → Check connection string format, DNS resolution ``` ``` ### references/postgresql-queries.md ```markdown # PostgreSQL SQL Queries SQL queries in PostgreSQL: SELECT, JOINs, subqueries, CTEs, window functions, and advanced patterns. ## Basic SELECT ### Simple Queries ```sql -- Select all columns SELECT * FROM users; -- Select specific columns SELECT id, name, email FROM users; -- With alias SELECT name AS full_name, email AS contact_email FROM users; -- Distinct values SELECT DISTINCT status FROM orders; -- Count rows SELECT COUNT(*) FROM users; SELECT COUNT(DISTINCT status) FROM orders; ``` ### WHERE Clause ```sql -- Equality SELECT * FROM users WHERE status = 'active'; -- Comparison SELECT * FROM products WHERE price > 100; SELECT * FROM orders WHERE total BETWEEN 100 AND 500; -- Pattern matching SELECT * FROM users WHERE email LIKE '%@example.com'; SELECT * FROM users WHERE name ILIKE 'john%'; -- case-insensitive -- IN operator SELECT * FROM orders WHERE status IN ('pending', 'processing'); -- NULL checks SELECT * FROM users WHERE deleted_at IS NULL; SELECT * FROM users WHERE phone_number IS NOT NULL; -- Logical operators SELECT * FROM products WHERE price > 100 AND stock > 0; SELECT * FROM users WHERE status = 'active' OR verified = true; SELECT * FROM products WHERE NOT (price > 1000); ``` ### ORDER BY ```sql -- Ascending (default) SELECT * FROM users ORDER BY created_at; -- Descending SELECT * FROM users ORDER BY created_at DESC; -- Multiple columns SELECT * FROM orders ORDER BY status ASC, created_at DESC; -- NULL handling SELECT * FROM users ORDER BY last_login NULLS FIRST; SELECT * FROM users ORDER BY last_login NULLS LAST; ``` ### LIMIT and OFFSET ```sql -- Limit results SELECT * FROM users LIMIT 10; -- Pagination SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20; -- Alternative: FETCH SELECT * FROM users OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY; ``` ## JOINs ### INNER JOIN ```sql -- Match rows from both tables SELECT orders.id, orders.total, customers.name FROM orders INNER JOIN customers ON orders.customer_id = customers.id; -- Short syntax SELECT o.id, o.total, c.name FROM orders o JOIN customers c ON o.customer_id = c.id; -- Multiple joins SELECT o.id, c.name, p.name AS product FROM orders o JOIN customers c ON o.customer_id = c.id JOIN order_items oi ON oi.order_id = o.id JOIN products p ON oi.product_id = p.id; ``` ### LEFT JOIN (LEFT OUTER JOIN) ```sql -- All rows from left table, matching rows from right SELECT c.name, o.id AS order_id FROM customers c LEFT JOIN orders o ON c.id = o.customer_id; -- Find customers without orders SELECT c.name FROM customers c LEFT JOIN orders o ON c.id = o.customer_id WHERE o.id IS NULL; ``` ### RIGHT JOIN (RIGHT OUTER JOIN) ```sql -- All rows from right table, matching rows from left SELECT c.name, o.id AS order_id FROM orders o RIGHT JOIN customers c ON o.customer_id = c.id; ``` ### FULL OUTER JOIN ```sql -- All rows from both tables SELECT c.name, o.id AS order_id FROM customers c FULL OUTER JOIN orders o ON c.id = o.customer_id; ``` ### CROSS JOIN ```sql -- Cartesian product (all combinations) SELECT c.name, p.name FROM colors c CROSS JOIN products p; ``` ### Self Join ```sql -- Join table to itself SELECT e1.name AS employee, e2.name AS manager FROM employees e1 LEFT JOIN employees e2 ON e1.manager_id = e2.id; ``` ## Subqueries ### Scalar Subquery ```sql -- Return single value SELECT name, salary, (SELECT AVG(salary) FROM employees) AS avg_salary FROM employees; ``` ### IN Subquery ```sql -- Match against set of values SELECT name FROM customers WHERE id IN ( SELECT customer_id FROM orders WHERE total > 1000 ); ``` ### EXISTS Subquery ```sql -- Check if subquery returns any rows SELECT name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id ); -- NOT EXISTS SELECT name FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id ); ``` ### Correlated Subquery ```sql -- Subquery references outer query SELECT name, salary FROM employees e1 WHERE salary > ( SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e1.department_id ); ``` ## Common Table Expressions (CTEs) ### Simple CTE ```sql -- Named temporary result set WITH active_users AS ( SELECT id, name, email FROM users WHERE status = 'active' ) SELECT * FROM active_users WHERE created_at > '2024-01-01'; ``` ### Multiple CTEs ```sql WITH active_customers AS ( SELECT id, name FROM customers WHERE active = true ), recent_orders AS ( SELECT customer_id, SUM(total) AS total_spent FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '30 days' GROUP BY customer_id ) SELECT c.name, COALESCE(o.total_spent, 0) AS spent FROM active_customers c LEFT JOIN recent_orders o ON c.id = o.customer_id; ``` ### Recursive CTE ```sql -- Tree traversal, hierarchical data WITH RECURSIVE category_tree AS ( -- Base case: root categories SELECT id, name, parent_id, 0 AS level FROM categories WHERE parent_id IS NULL UNION ALL -- Recursive case: child categories SELECT c.id, c.name, c.parent_id, ct.level + 1 FROM categories c JOIN category_tree ct ON c.parent_id = ct.id ) SELECT * FROM category_tree ORDER BY level, name; -- Employee hierarchy WITH RECURSIVE org_chart AS ( SELECT id, name, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.name, e.manager_id, oc.level + 1 FROM employees e JOIN org_chart oc ON e.manager_id = oc.id ) SELECT * FROM org_chart; ``` ## Aggregate Functions ### Basic Aggregates ```sql -- COUNT, SUM, AVG, MIN, MAX SELECT COUNT(*) AS total_orders, SUM(total) AS total_revenue, AVG(total) AS avg_order_value, MIN(total) AS min_order, MAX(total) AS max_order FROM orders; -- COUNT variations SELECT COUNT(*) FROM users; -- All rows SELECT COUNT(phone_number) FROM users; -- Non-NULL values SELECT COUNT(DISTINCT status) FROM orders; -- Unique values ``` ### GROUP BY ```sql -- Aggregate by groups SELECT status, COUNT(*) AS count FROM orders GROUP BY status; -- Multiple grouping columns SELECT customer_id, status, COUNT(*) AS count FROM orders GROUP BY customer_id, status; -- With aggregate functions SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent, AVG(total) AS avg_order FROM orders GROUP BY customer_id; ``` ### HAVING ```sql -- Filter after aggregation SELECT customer_id, SUM(total) AS total_spent FROM orders GROUP BY customer_id HAVING SUM(total) > 1000; -- Multiple conditions SELECT status, COUNT(*) AS count FROM orders GROUP BY status HAVING COUNT(*) > 10; ``` ## Window Functions ### ROW_NUMBER ```sql -- Assign unique number to each row SELECT id, name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank FROM employees; -- Partition by group SELECT id, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank FROM employees; ``` ### RANK / DENSE_RANK ```sql -- RANK: gaps in ranking for ties -- DENSE_RANK: no gaps SELECT id, name, salary, RANK() OVER (ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees; ``` ### LAG / LEAD ```sql -- Access previous/next row SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_revenue, LEAD(revenue) OVER (ORDER BY date) AS next_revenue, revenue - LAG(revenue) OVER (ORDER BY date) AS change FROM daily_sales; ``` ### Running Totals ```sql -- Cumulative sum SELECT date, amount, SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total FROM transactions; -- Simpler syntax SELECT date, amount, SUM(amount) OVER (ORDER BY date) AS running_total FROM transactions; ``` ### Moving Averages ```sql -- 7-day moving average SELECT date, value, AVG(value) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS moving_avg_7d FROM metrics; ``` ## Advanced Patterns ### CASE Expressions ```sql -- Simple CASE SELECT name, CASE status WHEN 'active' THEN 'Active User' WHEN 'pending' THEN 'Pending Verification' ELSE 'Inactive' END AS status_label FROM users; -- Searched CASE SELECT name, age, CASE WHEN age < 18 THEN 'Minor' WHEN age BETWEEN 18 AND 65 THEN 'Adult' ELSE 'Senior' END AS age_group FROM users; ``` ### COALESCE ```sql -- Return first non-NULL value SELECT name, COALESCE(phone_number, email, 'No contact') AS contact FROM users; ``` ### NULLIF ```sql -- Return NULL if values equal SELECT name, NULLIF(status, 'deleted') AS active_status FROM users; ``` ### Array Operations ```sql -- Array aggregate SELECT customer_id, ARRAY_AGG(product_id) AS products FROM order_items GROUP BY customer_id; -- Unnest array SELECT unnest(ARRAY[1, 2, 3, 4, 5]); -- Array contains SELECT * FROM products WHERE tags @> ARRAY['featured']; ``` ### JSON Operations ```sql -- Query JSON/JSONB SELECT data->>'name' AS name FROM documents; SELECT data->'address'->>'city' AS city FROM documents; -- Check key exists SELECT * FROM documents WHERE data ? 'email'; -- JSONB operators SELECT * FROM documents WHERE data @> '{"status": "active"}'; -- JSON aggregation SELECT json_agg(name) FROM users; SELECT json_object_agg(id, name) FROM users; ``` ## Set Operations ### UNION ```sql -- Combine results (removes duplicates) SELECT name FROM customers UNION SELECT name FROM suppliers; -- Keep duplicates SELECT name FROM customers UNION ALL SELECT name FROM suppliers; ``` ### INTERSECT ```sql -- Common rows SELECT email FROM users INTERSECT SELECT email FROM subscribers; ``` ### EXCEPT ```sql -- Rows in first query but not second SELECT email FROM users EXCEPT SELECT email FROM unsubscribed; ``` ## Best Practices 1. **Use indexes** on WHERE, JOIN, ORDER BY columns 2. **Avoid SELECT *** - specify needed columns 3. **Use EXISTS** instead of IN for large subqueries 4. **Filter early** - WHERE before JOIN when possible 5. **Use CTEs** for readability over nested subqueries 6. **Parameterize queries** - prevent SQL injection 7. **Use window functions** instead of self-joins 8. **Test with EXPLAIN** - analyze query plans ``` ### references/postgresql-psql-cli.md ```markdown # PostgreSQL psql CLI Command-line interface for PostgreSQL: connection, meta-commands, scripting, and interactive usage. ## Connection ### Basic Connection ```bash # Connect to database psql -U username -d database -h hostname -p 5432 # Connect using URI psql postgresql://username:password@hostname:5432/database # Environment variables export PGUSER=postgres export PGPASSWORD=mypassword export PGHOST=localhost export PGPORT=5432 export PGDATABASE=mydb psql ``` ### Password File (~/.pgpass) ```bash # Format: hostname:port:database:username:password # chmod 600 ~/.pgpass localhost:5432:mydb:postgres:mypassword *.example.com:5432:*:appuser:apppass ``` ### SSL Connection ```bash # Require SSL psql "host=hostname sslmode=require user=username dbname=database" # Verify certificate psql "host=hostname sslmode=verify-full \ sslcert=/path/to/client.crt \ sslkey=/path/to/client.key \ sslrootcert=/path/to/ca.crt" ``` ## Essential Meta-Commands ### Database Navigation ```bash \l or \list # List databases \l+ # List with sizes \c database # Connect to database \c database username # Connect as user \conninfo # Connection info ``` ### Schema Inspection ```bash \dn # List schemas \dt # List tables \dt+ # Tables with sizes \dt *.* # All tables, all schemas \di # List indexes \dv # List views \dm # List materialized views \ds # List sequences \df # List functions ``` ### Object Description ```bash \d tablename # Describe table \d+ tablename # Detailed description \d indexname # Describe index \df functionname # Describe function \du # List users/roles \dp tablename # Show permissions ``` ### Output Formatting ```bash \x # Toggle expanded output \x on # Enable expanded \x off # Disable expanded \a # Toggle aligned output \t # Toggle tuples only \H # HTML output \pset format csv # CSV format \pset null '[NULL]' # Show NULL values ``` ### Execution Commands ```bash \i filename.sql # Execute SQL file \o output.txt # Redirect output to file \o # Stop redirecting \! command # Execute shell command \timing # Toggle timing \q # Quit ``` ## psql Command-Line Options ```bash # Connection -h hostname # Host -p port # Port (default 5432) -U username # Username -d database # Database -W # Prompt for password # Execution -c "SQL" # Execute command and exit -f file.sql # Execute file --command="SQL" # Execute command # Output -t # Tuples only (no headers) -A # Unaligned output -F "," # Field separator -o output.txt # Output to file -q # Quiet mode -x # Expanded output # Script options -1 # Execute as transaction --on-error-stop # Stop on error -v variable=value # Set variable -L logfile.log # Log session ``` ## Running SQL ### Interactive Queries ```sql -- Simple query SELECT * FROM users; -- Multi-line (ends with semicolon) SELECT id, name, email FROM users WHERE active = true; -- Edit in editor \e -- Repeat last query \g -- Send to file \g output.txt ``` ### Variables ```bash # Set variable \set myvar 'value' \set limit 10 # Use variable SELECT * FROM users LIMIT :limit; # String variable (quoted) \set username 'alice' SELECT * FROM users WHERE name = :'username'; # Show all variables \set # Unset variable \unset myvar ``` ### Scripts ```sql -- script.sql \set ON_ERROR_STOP on BEGIN; CREATE TABLE IF NOT EXISTS users ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, email TEXT UNIQUE ); INSERT INTO users (name, email) VALUES ('Alice', '[email protected]'), ('Bob', '[email protected]'); COMMIT; \echo 'Script completed!' ``` ```bash # Execute script psql -d mydb -f script.sql # With error stopping psql -d mydb -f script.sql --on-error-stop # In single transaction psql -d mydb -1 -f script.sql ``` ## Data Import/Export ### COPY (Server-side) ```sql -- Export to CSV COPY users TO '/tmp/users.csv' WITH (FORMAT CSV, HEADER); -- Import from CSV COPY users FROM '/tmp/users.csv' WITH (FORMAT CSV, HEADER); -- Query to file COPY (SELECT * FROM users WHERE active = true) TO '/tmp/active_users.csv' WITH (FORMAT CSV, HEADER); ``` ### \copy (Client-side) ```bash # Export (from psql) \copy users TO 'users.csv' WITH (FORMAT CSV, HEADER) # Export query results \copy (SELECT * FROM users WHERE active = true) TO 'active.csv' CSV HEADER # Import \copy users FROM 'users.csv' WITH (FORMAT CSV, HEADER) # To stdout \copy users TO STDOUT CSV HEADER > users.csv ``` ### pg_dump / pg_restore ```bash # Dump database pg_dump mydb > mydb.sql pg_dump -d mydb -Fc > mydb.dump # Custom format # Dump specific table pg_dump -t users mydb > users.sql # Schema only pg_dump -s mydb > schema.sql # Data only pg_dump -a mydb > data.sql # Restore psql mydb < mydb.sql pg_restore -d mydb mydb.dump ``` ## Configuration ### ~/.psqlrc ```bash # Auto-loaded on psql startup \set QUIET ON -- Prompt customization \set PROMPT1 '%n@%m:%>/%/%R%# ' -- Output settings \pset null '[NULL]' \pset border 2 \pset linestyle unicode \pset expanded auto -- Timing \timing ON -- Pager \pset pager always -- History \set HISTSIZE 10000 -- Custom shortcuts \set active_users 'SELECT * FROM users WHERE status = ''active'';' \set dbsize 'SELECT pg_size_pretty(pg_database_size(current_database()));' \set QUIET OFF ``` ### Useful Aliases ```bash # Add to ~/.psqlrc \set locks 'SELECT pid, usename, pg_blocking_pids(pid) as blocked_by, query FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;' \set activity 'SELECT pid, usename, state, query FROM pg_stat_activity WHERE state != ''idle'';' \set table_sizes 'SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||''.''||tablename)) FROM pg_tables ORDER BY pg_total_relation_size(schemaname||''.''||tablename) DESC;' \set index_usage 'SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes ORDER BY idx_scan;' # Usage: :locks, :activity, :table_sizes ``` ## Transactions ```sql -- Begin transaction BEGIN; -- Or START TRANSACTION; -- Savepoint SAVEPOINT sp1; -- Rollback to savepoint ROLLBACK TO sp1; -- Commit COMMIT; -- Rollback ROLLBACK; ``` ## Performance Analysis ### EXPLAIN ```sql -- Show query plan EXPLAIN SELECT * FROM users WHERE id = 1; -- With execution EXPLAIN ANALYZE SELECT * FROM users WHERE age > 18; -- Verbose EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM users WHERE active = true; ``` ### Current Activity ```sql -- Active queries SELECT pid, usename, state, query FROM pg_stat_activity; -- Long-running queries SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC; -- Blocking queries SELECT blocked.pid, blocking.pid AS blocking_pid, blocked.query AS blocked_query, blocking.query AS blocking_query FROM pg_stat_activity blocked JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid)); ``` ### Statistics ```sql -- Database size SELECT pg_size_pretty(pg_database_size(current_database())); -- Table sizes SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; -- Index usage SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes ORDER BY idx_scan; ``` ## User Management ```sql -- Create user CREATE USER appuser WITH PASSWORD 'secure_password'; -- Create superuser CREATE USER admin WITH PASSWORD 'password' SUPERUSER; -- Alter user ALTER USER appuser WITH PASSWORD 'new_password'; -- Grant permissions GRANT CONNECT ON DATABASE mydb TO appuser; GRANT USAGE ON SCHEMA public TO appuser; GRANT SELECT, INSERT, UPDATE, DELETE ON users TO appuser; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO appuser; -- Default privileges ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO appuser; -- View permissions \dp users -- Drop user DROP USER appuser; ``` ## Backup Patterns ```bash # Daily backup script #!/bin/bash DATE=$(date +%Y%m%d) pg_dump -Fc mydb > /backups/mydb_$DATE.dump # Restore latest pg_restore -d mydb /backups/mydb_latest.dump # Backup all databases pg_dumpall > /backups/all_databases.sql # Backup specific schema pg_dump -n public mydb > public_schema.sql ``` ## Troubleshooting ### Connection Issues ```bash # Test connection psql -h hostname -U username -d postgres -c "SELECT 1;" # Check pg_hba.conf # /var/lib/postgresql/data/pg_hba.conf # Verbose connection psql -h hostname -d mydb --echo-all ``` ### Performance Issues ```sql -- Enable slow query logging ALTER DATABASE mydb SET log_min_duration_statement = 100; -- Check cache hit ratio SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit) as heap_hit, sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio FROM pg_statio_user_tables; -- Find slow queries SELECT query, mean_exec_time, calls FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10; ``` ## Best Practices 1. **Use .pgpass** for credential management 2. **Set ON_ERROR_STOP** in scripts 3. **Use transactions** for multi-statement changes 4. **Test with EXPLAIN** before running expensive queries 5. **Use \timing** to measure query performance 6. **Configure ~/.psqlrc** for productivity 7. **Use variables** for dynamic queries 8. **Log sessions** with -L for auditing 9. **Use \copy** instead of COPY for client operations 10. **Regular backups** with pg_dump ``` ### references/postgresql-performance.md ```markdown # PostgreSQL Performance Optimization Query optimization, indexing strategies, EXPLAIN analysis, and performance tuning for PostgreSQL. ## EXPLAIN Command ### Basic EXPLAIN ```sql -- Show query plan EXPLAIN SELECT * FROM users WHERE id = 1; -- Output shows: -- - Execution plan nodes -- - Estimated costs -- - Estimated rows ``` ### EXPLAIN ANALYZE ```sql -- Execute query and show actual performance EXPLAIN ANALYZE SELECT * FROM users WHERE age > 18; -- Shows: -- - Actual execution time -- - Actual rows returned -- - Planning time -- - Execution time ``` ### EXPLAIN Options ```sql -- Verbose output EXPLAIN (VERBOSE) SELECT * FROM users; -- Show buffer usage EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE active = true; -- JSON format EXPLAIN (FORMAT JSON, ANALYZE) SELECT * FROM users; -- All options EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, COSTS) SELECT * FROM users WHERE id = 1; ``` ## Understanding Query Plans ### Scan Methods #### Sequential Scan ```sql -- Full table scan (reads all rows) EXPLAIN SELECT * FROM users WHERE name = 'Alice'; -- Output: Seq Scan on users -- Indicates: no suitable index or small table ``` #### Index Scan ```sql -- Uses index to find rows EXPLAIN SELECT * FROM users WHERE id = 1; -- Output: Index Scan using users_pkey on users -- Best for: selective queries, small result sets ``` #### Index Only Scan ```sql -- Query covered by index (no table access) CREATE INDEX idx_users_email_name ON users(email, name); EXPLAIN SELECT email, name FROM users WHERE email = '[email protected]'; -- Output: Index Only Scan using idx_users_email_name -- Best performance: no heap fetch needed ``` #### Bitmap Scan ```sql -- Combines multiple indexes or handles large result sets EXPLAIN SELECT * FROM users WHERE age > 18 AND status = 'active'; -- Output: -- Bitmap Heap Scan on users -- Recheck Cond: ... -- -> Bitmap Index Scan on idx_age -- Good for: moderate selectivity ``` ### Join Methods #### Nested Loop ```sql -- For each row in outer table, scan inner table EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE c.id = 1; -- Output: Nested Loop -- Best for: small outer table, indexed inner table ``` #### Hash Join ```sql -- Build hash table from smaller table EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id; -- Output: Hash Join -- Best for: large tables, equality conditions ``` #### Merge Join ```sql -- Both inputs sorted on join key EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id ORDER BY o.customer_id; -- Output: Merge Join -- Best for: pre-sorted data, large sorted inputs ``` ## Indexing Strategies ### B-tree Index (Default) ```sql -- General purpose index CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_date ON orders(order_date); -- Supports: =, <, <=, >, >=, BETWEEN, IN, IS NULL -- Supports: ORDER BY, MIN/MAX ``` ### Composite Index ```sql -- Multiple columns (order matters!) CREATE INDEX idx_users_status_created ON users(status, created_at); -- Supports queries on: -- - status -- - status, created_at -- Does NOT support: created_at alone -- Column order: most selective first -- Exception: match query WHERE/ORDER BY patterns ``` ### Partial Index ```sql -- Index subset of rows CREATE INDEX idx_active_users ON users(email) WHERE status = 'active'; -- Smaller index, faster queries with matching WHERE clause -- Query must include WHERE status = 'active' to use index ``` ### Expression Index ```sql -- Index on computed value CREATE INDEX idx_users_lower_email ON users(LOWER(email)); -- Query must use same expression SELECT * FROM users WHERE LOWER(email) = '[email protected]'; ``` ### GIN Index (Generalized Inverted Index) ```sql -- For array, JSONB, full-text search CREATE INDEX idx_products_tags ON products USING GIN(tags); CREATE INDEX idx_documents_data ON documents USING GIN(data); -- Array queries SELECT * FROM products WHERE tags @> ARRAY['featured']; -- JSONB queries SELECT * FROM documents WHERE data @> '{"status": "active"}'; ``` ### GiST Index (Generalized Search Tree) ```sql -- For geometric data, range types, full-text CREATE INDEX idx_locations_geom ON locations USING GiST(geom); -- Geometric queries SELECT * FROM locations WHERE geom && ST_MakeEnvelope(...); ``` ### Hash Index ```sql -- Equality comparisons only CREATE INDEX idx_users_hash_email ON users USING HASH(email); -- Only supports: = -- Rarely used (B-tree usually better) ``` ### BRIN Index (Block Range Index) ```sql -- For very large tables with natural clustering CREATE INDEX idx_logs_brin_created ON logs USING BRIN(created_at); -- Tiny index size, good for append-only data -- Best for: time-series, logging, large tables ``` ## Query Optimization Techniques ### Avoid SELECT * ```sql -- Bad SELECT * FROM users WHERE id = 1; -- Good (only needed columns) SELECT id, name, email FROM users WHERE id = 1; ``` ### Use LIMIT ```sql -- Limit result set SELECT * FROM users ORDER BY created_at DESC LIMIT 10; -- PostgreSQL can stop early with LIMIT ``` ### Index for ORDER BY ```sql -- Create index matching sort order CREATE INDEX idx_users_created_desc ON users(created_at DESC); -- Query uses index for sorting SELECT * FROM users ORDER BY created_at DESC LIMIT 10; ``` ### Covering Index ```sql -- Include all queried columns in index CREATE INDEX idx_users_email_name_status ON users(email, name, status); -- Query covered by index (no table access) SELECT name, status FROM users WHERE email = '[email protected]'; ``` ### EXISTS vs IN ```sql -- Prefer EXISTS for large subqueries -- Bad SELECT * FROM customers WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000); -- Good SELECT * FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000); ``` ### JOIN Order ```sql -- Filter before joining -- Bad SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.status = 'completed' AND c.country = 'USA'; -- Good (filter in subquery) SELECT * FROM ( SELECT * FROM orders WHERE status = 'completed' ) o JOIN ( SELECT * FROM customers WHERE country = 'USA' ) c ON o.customer_id = c.id; -- Or use CTE WITH filtered_orders AS ( SELECT * FROM orders WHERE status = 'completed' ), filtered_customers AS ( SELECT * FROM customers WHERE country = 'USA' ) SELECT * FROM filtered_orders o JOIN filtered_customers c ON o.customer_id = c.id; ``` ### Avoid Functions in WHERE ```sql -- Bad (index not used) SELECT * FROM users WHERE LOWER(email) = '[email protected]'; -- Good (create expression index) CREATE INDEX idx_users_lower_email ON users(LOWER(email)); -- Then query uses index -- Or store lowercase separately ALTER TABLE users ADD COLUMN email_lower TEXT; UPDATE users SET email_lower = LOWER(email); CREATE INDEX idx_users_email_lower ON users(email_lower); ``` ## Statistics and ANALYZE ### Update Statistics ```sql -- Analyze table (update statistics) ANALYZE users; -- Analyze specific columns ANALYZE users(email, status); -- Analyze all tables ANALYZE; -- Auto-analyze (configured in postgresql.conf) autovacuum_analyze_threshold = 50 autovacuum_analyze_scale_factor = 0.1 ``` ### Check Statistics ```sql -- Last analyze time SELECT schemaname, tablename, last_analyze, last_autoanalyze FROM pg_stat_user_tables; -- Statistics targets (adjust for important columns) ALTER TABLE users ALTER COLUMN email SET STATISTICS 1000; ``` ## VACUUM and Maintenance ### VACUUM ```sql -- Reclaim storage, update statistics VACUUM users; -- Verbose output VACUUM VERBOSE users; -- Full vacuum (rewrites table, locks table) VACUUM FULL users; -- Analyze after vacuum VACUUM ANALYZE users; ``` ### Auto-Vacuum ```sql -- Check autovacuum status SELECT schemaname, tablename, last_vacuum, last_autovacuum FROM pg_stat_user_tables; -- Configure in postgresql.conf autovacuum = on autovacuum_vacuum_threshold = 50 autovacuum_vacuum_scale_factor = 0.2 ``` ### REINDEX ```sql -- Rebuild index REINDEX INDEX idx_users_email; -- Rebuild all indexes on table REINDEX TABLE users; -- Rebuild all indexes in schema REINDEX SCHEMA public; ``` ## Monitoring Queries ### Active Queries ```sql -- Current queries SELECT pid, usename, state, query, query_start FROM pg_stat_activity WHERE state != 'idle'; -- Long-running queries SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' AND now() - query_start > interval '5 minutes' ORDER BY duration DESC; ``` ### Slow Query Log ```sql -- Enable slow query logging (postgresql.conf) log_min_duration_statement = 100 -- milliseconds -- Or per session SET log_min_duration_statement = 100; -- Logs appear in PostgreSQL log files ``` ### pg_stat_statements Extension ```sql -- Enable extension CREATE EXTENSION pg_stat_statements; -- View query statistics SELECT query, calls, total_exec_time, mean_exec_time, rows FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10; -- Reset statistics SELECT pg_stat_statements_reset(); ``` ## Index Usage Analysis ### Check Index Usage ```sql -- Index usage statistics SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes ORDER BY idx_scan; -- Unused indexes (idx_scan = 0) SELECT schemaname, tablename, indexname FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey'; ``` ### Index Size ```sql -- Index sizes SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC; ``` ### Missing Indexes ```sql -- Tables with sequential scans SELECT schemaname, tablename, seq_scan, seq_tup_read FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC; -- Consider adding indexes to high seq_scan tables ``` ## Configuration Tuning ### Memory Settings (postgresql.conf) ```conf # Shared buffers (25% of RAM) shared_buffers = 4GB # Work memory (per operation) work_mem = 64MB # Maintenance work memory (VACUUM, CREATE INDEX) maintenance_work_mem = 512MB # Effective cache size (estimate of OS cache) effective_cache_size = 12GB ``` ### Query Planner Settings ```conf # Random page cost (lower for SSD) random_page_cost = 1.1 # Effective IO concurrency (number of concurrent disk operations) effective_io_concurrency = 200 # Cost of parallel query startup parallel_setup_cost = 1000 parallel_tuple_cost = 0.1 ``` ### Connection Settings ```conf # Max connections max_connections = 100 # Connection pooling recommended (pgBouncer) ``` ## Best Practices 1. **Index strategy** - Index foreign keys - Index WHERE clause columns - Index ORDER BY columns - Use composite indexes for multi-column queries - Keep index count reasonable (5-10 per table) 2. **Query optimization** - Use EXPLAIN ANALYZE - Avoid SELECT * - Use LIMIT when possible - Filter before joining - Use appropriate join type 3. **Statistics** - Regular ANALYZE - Increase statistics target for skewed distributions - Monitor autovacuum 4. **Monitoring** - Enable pg_stat_statements - Log slow queries - Monitor index usage - Check table bloat 5. **Maintenance** - Regular VACUUM - REINDEX periodically - Update PostgreSQL version - Monitor disk space 6. **Configuration** - Tune memory settings - Adjust for workload (OLTP vs OLAP) - Use connection pooling - Enable query logging 7. **Testing** - Test queries with production-like data volume - Benchmark before/after changes - Monitor production metrics ``` ### references/postgresql-administration.md ```markdown # PostgreSQL Administration User management, backups, replication, maintenance, and production database administration. ## User and Role Management ### Create Users ```sql -- Create user with password CREATE USER appuser WITH PASSWORD 'secure_password'; -- Create superuser CREATE USER admin WITH SUPERUSER PASSWORD 'admin_password'; -- Create role without login CREATE ROLE readonly; -- Create user with attributes CREATE USER developer WITH PASSWORD 'dev_pass' CREATEDB VALID UNTIL '2025-12-31'; ``` ### Alter Users ```sql -- Change password ALTER USER appuser WITH PASSWORD 'new_password'; -- Add attributes ALTER USER appuser WITH CREATEDB CREATEROLE; -- Remove attributes ALTER USER appuser WITH NOSUPERUSER; -- Rename user ALTER USER oldname RENAME TO newname; -- Set connection limit ALTER USER appuser CONNECTION LIMIT 10; ``` ### Roles and Inheritance ```sql -- Create role hierarchy CREATE ROLE readonly; CREATE ROLE readwrite; -- Grant role to user GRANT readonly TO appuser; GRANT readwrite TO developer; -- Revoke role REVOKE readonly FROM appuser; -- Role membership \du ``` ### Permissions #### Database Level ```sql -- Grant database access GRANT CONNECT ON DATABASE mydb TO appuser; -- Grant schema usage GRANT USAGE ON SCHEMA public TO appuser; -- Revoke access REVOKE CONNECT ON DATABASE mydb FROM appuser; ``` #### Table Level ```sql -- Grant table permissions GRANT SELECT ON users TO appuser; GRANT SELECT, INSERT, UPDATE ON orders TO appuser; GRANT ALL PRIVILEGES ON products TO appuser; -- Grant on all tables GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly; -- Revoke permissions REVOKE INSERT ON users FROM appuser; ``` #### Column Level ```sql -- Grant specific columns GRANT SELECT (id, name, email) ON users TO appuser; GRANT UPDATE (status) ON orders TO appuser; ``` #### Sequence Permissions ```sql -- Grant sequence usage (for SERIAL/auto-increment) GRANT USAGE, SELECT ON SEQUENCE users_id_seq TO appuser; GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO appuser; ``` #### Function Permissions ```sql -- Grant execute on function GRANT EXECUTE ON FUNCTION get_user(integer) TO appuser; ``` ### Default Privileges ```sql -- Set default privileges for future objects ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readonly; ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwrite; ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT USAGE ON SEQUENCES TO readwrite; ``` ### View Permissions ```sql -- Show table permissions \dp users -- Show role memberships \du -- Query permissions SELECT grantee, privilege_type FROM information_schema.role_table_grants WHERE table_name = 'users'; ``` ## Backup and Restore ### pg_dump (Logical Backup) ```bash # Dump database to SQL file pg_dump mydb > mydb.sql # Custom format (compressed, allows selective restore) pg_dump -Fc mydb > mydb.dump # Directory format (parallel dump) pg_dump -Fd mydb -j 4 -f mydb_dir # Specific table pg_dump -t users mydb > users.sql # Multiple tables pg_dump -t users -t orders mydb > tables.sql # Schema only pg_dump -s mydb > schema.sql # Data only pg_dump -a mydb > data.sql # Exclude table pg_dump --exclude-table=logs mydb > mydb.sql # With compression pg_dump -Fc -Z 9 mydb > mydb.dump ``` ### pg_dumpall (All Databases) ```bash # Dump all databases pg_dumpall > all_databases.sql # Only globals (roles, tablespaces) pg_dumpall --globals-only > globals.sql ``` ### pg_restore ```bash # Restore from custom format pg_restore -d mydb mydb.dump # Restore specific table pg_restore -d mydb -t users mydb.dump # List contents pg_restore -l mydb.dump # Parallel restore pg_restore -d mydb -j 4 mydb.dump # Clean database first pg_restore -d mydb --clean mydb.dump # Create database if not exists pg_restore -C -d postgres mydb.dump ``` ### Restore from SQL ```bash # Restore SQL dump psql mydb < mydb.sql # Create database and restore createdb mydb psql mydb < mydb.sql # Single transaction psql -1 mydb < mydb.sql # Stop on error psql --set ON_ERROR_STOP=on mydb < mydb.sql ``` ### Automated Backup Script ```bash #!/bin/bash # backup.sh # Configuration DB_NAME="mydb" BACKUP_DIR="/backups" DATE=$(date +%Y%m%d_%H%M%S) RETENTION_DAYS=7 # Create backup pg_dump -Fc "$DB_NAME" > "$BACKUP_DIR/${DB_NAME}_${DATE}.dump" # Remove old backups find "$BACKUP_DIR" -name "${DB_NAME}_*.dump" -mtime +$RETENTION_DAYS -delete # Log echo "Backup completed: ${DB_NAME}_${DATE}.dump" ``` ### Point-in-Time Recovery (PITR) ```bash # Enable WAL archiving (postgresql.conf) wal_level = replica archive_mode = on archive_command = 'cp %p /archive/%f' max_wal_senders = 3 # Base backup pg_basebackup -D /backup/base -Ft -z -P # Restore to point in time # 1. Stop PostgreSQL # 2. Restore base backup # 3. Create recovery.conf with recovery_target_time # 4. Start PostgreSQL ``` ## Replication ### Streaming Replication (Primary-Replica) #### Primary Setup ```sql -- Create replication user CREATE USER replicator WITH REPLICATION PASSWORD 'replica_pass'; -- Configure postgresql.conf wal_level = replica max_wal_senders = 3 wal_keep_size = 64MB -- Configure pg_hba.conf host replication replicator replica_ip/32 md5 ``` #### Replica Setup ```bash # Stop replica PostgreSQL systemctl stop postgresql # Remove data directory rm -rf /var/lib/postgresql/data/* # Clone from primary pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -R # Start replica systemctl start postgresql # Check replication status SELECT * FROM pg_stat_replication; -- On primary ``` ### Logical Replication #### Publisher (Primary) ```sql -- Create publication CREATE PUBLICATION my_publication FOR ALL TABLES; -- Or specific tables CREATE PUBLICATION my_publication FOR TABLE users, orders; -- Check publications \dRp SELECT * FROM pg_publication; ``` #### Subscriber (Replica) ```sql -- Create subscription CREATE SUBSCRIPTION my_subscription CONNECTION 'host=primary_host dbname=mydb user=replicator password=replica_pass' PUBLICATION my_publication; -- Check subscriptions \dRs SELECT * FROM pg_subscription; -- Monitor replication SELECT * FROM pg_stat_subscription; ``` ## Monitoring ### Database Size ```sql -- Database size SELECT pg_size_pretty(pg_database_size('mydb')); -- Table sizes SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; -- Index sizes SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS size FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC; ``` ### Connections ```sql -- Current connections SELECT count(*) FROM pg_stat_activity; -- Connections by database SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname; -- Connection limit SHOW max_connections; -- Kill connection SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid = 12345; ``` ### Activity ```sql -- Active queries SELECT pid, usename, state, query, query_start FROM pg_stat_activity WHERE state != 'idle'; -- Long-running queries SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC; -- Blocking queries SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query FROM pg_stat_activity blocked JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid)); ``` ### Cache Hit Ratio ```sql -- Should be > 0.99 for good performance SELECT sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_ratio FROM pg_statio_user_tables; ``` ### Table Bloat ```sql -- Check for table bloat (requires pgstattuple extension) CREATE EXTENSION pgstattuple; SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size, pgstattuple(schemaname||'.'||tablename) AS stats FROM pg_tables WHERE schemaname = 'public'; ``` ## Maintenance ### VACUUM ```sql -- Reclaim storage VACUUM users; -- Verbose VACUUM VERBOSE users; -- Full (locks table, rewrites) VACUUM FULL users; -- With analyze VACUUM ANALYZE users; -- All tables VACUUM; ``` ### Auto-Vacuum ```sql -- Check last vacuum SELECT schemaname, tablename, last_vacuum, last_autovacuum FROM pg_stat_user_tables; -- Configure postgresql.conf autovacuum = on autovacuum_vacuum_threshold = 50 autovacuum_vacuum_scale_factor = 0.2 autovacuum_analyze_threshold = 50 autovacuum_analyze_scale_factor = 0.1 ``` ### REINDEX ```sql -- Rebuild index REINDEX INDEX idx_users_email; -- Rebuild all indexes on table REINDEX TABLE users; -- Rebuild database indexes REINDEX DATABASE mydb; -- Concurrently (doesn't lock) REINDEX INDEX CONCURRENTLY idx_users_email; ``` ### ANALYZE ```sql -- Update statistics ANALYZE users; -- Specific columns ANALYZE users(email, status); -- All tables ANALYZE; -- Verbose ANALYZE VERBOSE users; ``` ## Configuration ### postgresql.conf Location ```sql SHOW config_file; ``` ### Key Settings ```conf # Memory shared_buffers = 4GB # 25% of RAM work_mem = 64MB # Per operation maintenance_work_mem = 512MB # VACUUM, CREATE INDEX effective_cache_size = 12GB # OS cache estimate # Query Planner random_page_cost = 1.1 # Lower for SSD effective_io_concurrency = 200 # Concurrent disk ops # Connections max_connections = 100 superuser_reserved_connections = 3 # Logging log_destination = 'stderr' logging_collector = on log_directory = 'log' log_filename = 'postgresql-%Y-%m-%d.log' log_rotation_age = 1d log_min_duration_statement = 100 # Log slow queries # Replication wal_level = replica max_wal_senders = 3 wal_keep_size = 64MB # Autovacuum autovacuum = on autovacuum_vacuum_scale_factor = 0.2 autovacuum_analyze_scale_factor = 0.1 ``` ### Reload Configuration ```sql -- Reload config without restart SELECT pg_reload_conf(); -- Or from shell pg_ctl reload ``` ## Security ### SSL/TLS ```conf # postgresql.conf ssl = on ssl_cert_file = '/path/to/server.crt' ssl_key_file = '/path/to/server.key' ssl_ca_file = '/path/to/ca.crt' ``` ### pg_hba.conf (Host-Based Authentication) ```conf # TYPE DATABASE USER ADDRESS METHOD # Local connections local all postgres peer local all all md5 # Remote connections host all all 0.0.0.0/0 md5 host all all ::0/0 md5 # Replication host replication replicator replica_ip/32 md5 # SSL required hostssl all all 0.0.0.0/0 md5 ``` ### Row Level Security ```sql -- Enable RLS ALTER TABLE users ENABLE ROW LEVEL SECURITY; -- Create policy CREATE POLICY user_policy ON users USING (user_id = current_user_id()); -- Drop policy DROP POLICY user_policy ON users; -- View policies \d+ users ``` ## Best Practices 1. **Backups** - Daily automated backups - Test restores regularly - Store backups off-site - Use pg_dump custom format for flexibility 2. **Monitoring** - Monitor connections, queries, cache hit ratio - Set up alerts for critical metrics - Log slow queries - Use pg_stat_statements 3. **Security** - Use strong passwords - Restrict network access (pg_hba.conf) - Enable SSL/TLS - Regular security updates - Principle of least privilege 4. **Maintenance** - Regular VACUUM and ANALYZE - Monitor autovacuum - REINDEX periodically - Check for table bloat 5. **Configuration** - Tune for workload - Use connection pooling (pgBouncer) - Monitor and adjust memory settings - Keep PostgreSQL updated 6. **Replication** - At least one replica for HA - Monitor replication lag - Test failover procedures - Use logical replication for selective replication ``` ### scripts/db_migrate.py ```python #!/usr/bin/env python3 """ Database migration tool for MongoDB and PostgreSQL. Generates and applies schema migrations with rollback support. """ import argparse import json import os import sys from dataclasses import dataclass from datetime import datetime from pathlib import Path from typing import Any, Dict, List, Optional try: from pymongo import MongoClient MONGO_AVAILABLE = True except ImportError: MONGO_AVAILABLE = False try: import psycopg2 from psycopg2 import sql POSTGRES_AVAILABLE = True except ImportError: POSTGRES_AVAILABLE = False @dataclass class Migration: """Represents a database migration.""" id: str name: str timestamp: datetime database_type: str up_sql: Optional[str] = None down_sql: Optional[str] = None mongodb_operations: Optional[List[Dict[str, Any]]] = None applied: bool = False class MigrationManager: """Manages database migrations for MongoDB and PostgreSQL.""" def __init__(self, db_type: str, connection_string: str, migrations_dir: str = "./migrations"): """ Initialize migration manager. Args: db_type: Database type ('mongodb' or 'postgres') connection_string: Database connection string migrations_dir: Directory to store migration files """ self.db_type = db_type.lower() self.connection_string = connection_string self.migrations_dir = Path(migrations_dir) self.migrations_dir.mkdir(exist_ok=True) self.client = None self.db = None self.conn = None def connect(self) -> bool: """ Connect to database. Returns: True if connection successful, False otherwise """ try: if self.db_type == "mongodb": if not MONGO_AVAILABLE: print("Error: pymongo not installed") return False self.client = MongoClient(self.connection_string) self.db = self.client.get_default_database() # Test connection self.client.server_info() return True elif self.db_type == "postgres": if not POSTGRES_AVAILABLE: print("Error: psycopg2 not installed") return False self.conn = psycopg2.connect(self.connection_string) return True else: print(f"Error: Unsupported database type: {self.db_type}") return False except Exception as e: print(f"Connection error: {e}") return False def disconnect(self): """Disconnect from database.""" try: if self.client: self.client.close() if self.conn: self.conn.close() except Exception as e: print(f"Disconnect error: {e}") def _ensure_migrations_table(self): """Create migrations tracking table/collection if not exists.""" if self.db_type == "mongodb": # MongoDB creates collection automatically pass elif self.db_type == "postgres": with self.conn.cursor() as cur: cur.execute(""" CREATE TABLE IF NOT EXISTS migrations ( id VARCHAR(255) PRIMARY KEY, name VARCHAR(255) NOT NULL, applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ) """) self.conn.commit() def generate_migration(self, name: str, dry_run: bool = False) -> Optional[Migration]: """ Generate new migration file. Args: name: Migration name dry_run: If True, only show what would be generated Returns: Migration object if successful, None otherwise """ timestamp = datetime.now() migration_id = timestamp.strftime("%Y%m%d%H%M%S") filename = f"{migration_id}_{name}.json" filepath = self.migrations_dir / filename migration = Migration( id=migration_id, name=name, timestamp=timestamp, database_type=self.db_type ) if self.db_type == "mongodb": migration.mongodb_operations = [ { "operation": "createIndex", "collection": "example_collection", "index": {"field": 1}, "options": {} } ] elif self.db_type == "postgres": migration.up_sql = "-- Add your SQL here\n" migration.down_sql = "-- Add rollback SQL here\n" migration_data = { "id": migration.id, "name": migration.name, "timestamp": migration.timestamp.isoformat(), "database_type": migration.database_type, "up_sql": migration.up_sql, "down_sql": migration.down_sql, "mongodb_operations": migration.mongodb_operations } if dry_run: print(f"Would create: {filepath}") print(json.dumps(migration_data, indent=2)) return migration try: with open(filepath, "w") as f: json.dump(migration_data, f, indent=2) print(f"Created migration: {filepath}") return migration except Exception as e: print(f"Error creating migration: {e}") return None def get_pending_migrations(self) -> List[Migration]: """ Get list of pending migrations. Returns: List of pending Migration objects """ # Get applied migrations applied_ids = set() try: if self.db_type == "mongodb": applied_ids = { doc["id"] for doc in self.db.migrations.find({}, {"id": 1}) } elif self.db_type == "postgres": with self.conn.cursor() as cur: cur.execute("SELECT id FROM migrations") applied_ids = {row[0] for row in cur.fetchall()} except Exception as e: print(f"Error reading applied migrations: {e}") # Get all migration files pending = [] for filepath in sorted(self.migrations_dir.glob("*.json")): try: with open(filepath) as f: data = json.load(f) if data["id"] not in applied_ids: migration = Migration( id=data["id"], name=data["name"], timestamp=datetime.fromisoformat(data["timestamp"]), database_type=data["database_type"], up_sql=data.get("up_sql"), down_sql=data.get("down_sql"), mongodb_operations=data.get("mongodb_operations") ) pending.append(migration) except Exception as e: print(f"Error reading {filepath}: {e}") return pending def apply_migration(self, migration: Migration, dry_run: bool = False) -> bool: """ Apply migration. Args: migration: Migration to apply dry_run: If True, only show what would be executed Returns: True if successful, False otherwise """ print(f"Applying migration: {migration.id} - {migration.name}") if dry_run: if self.db_type == "mongodb": print("MongoDB operations:") print(json.dumps(migration.mongodb_operations, indent=2)) elif self.db_type == "postgres": print("SQL to execute:") print(migration.up_sql) return True try: if self.db_type == "mongodb": for op in migration.mongodb_operations or []: if op["operation"] == "createIndex": self.db[op["collection"]].create_index( list(op["index"].items()), **op.get("options", {}) ) # Record migration self.db.migrations.insert_one({ "id": migration.id, "name": migration.name, "applied_at": datetime.now() }) elif self.db_type == "postgres": with self.conn.cursor() as cur: cur.execute(migration.up_sql) # Record migration cur.execute( "INSERT INTO migrations (id, name) VALUES (%s, %s)", (migration.id, migration.name) ) self.conn.commit() print(f"✓ Applied: {migration.id}") return True except Exception as e: print(f"✗ Error applying migration: {e}") if self.conn: self.conn.rollback() return False def rollback_migration(self, migration_id: str, dry_run: bool = False) -> bool: """ Rollback migration. Args: migration_id: Migration ID to rollback dry_run: If True, only show what would be executed Returns: True if successful, False otherwise """ # Find migration file migration_file = None for filepath in self.migrations_dir.glob(f"{migration_id}_*.json"): migration_file = filepath break if not migration_file: print(f"Migration not found: {migration_id}") return False try: with open(migration_file) as f: data = json.load(f) print(f"Rolling back: {migration_id} - {data['name']}") if dry_run: if self.db_type == "postgres": print("SQL to execute:") print(data.get("down_sql", "-- No rollback defined")) return True if self.db_type == "postgres" and data.get("down_sql"): with self.conn.cursor() as cur: cur.execute(data["down_sql"]) cur.execute("DELETE FROM migrations WHERE id = %s", (migration_id,)) self.conn.commit() elif self.db_type == "mongodb": self.db.migrations.delete_one({"id": migration_id}) print(f"✓ Rolled back: {migration_id}") return True except Exception as e: print(f"✗ Error rolling back: {e}") if self.conn: self.conn.rollback() return False def main(): """Main entry point.""" parser = argparse.ArgumentParser(description="Database migration tool") parser.add_argument("--db", required=True, choices=["mongodb", "postgres"], help="Database type") parser.add_argument("--uri", help="Database connection string") parser.add_argument("--migrations-dir", default="./migrations", help="Migrations directory") subparsers = parser.add_subparsers(dest="command", required=True) # Generate command gen_parser = subparsers.add_parser("generate", help="Generate new migration") gen_parser.add_argument("name", help="Migration name") gen_parser.add_argument("--dry-run", action="store_true", help="Show what would be generated") # Apply command apply_parser = subparsers.add_parser("apply", help="Apply pending migrations") apply_parser.add_argument("--dry-run", action="store_true", help="Show what would be executed") # Rollback command rollback_parser = subparsers.add_parser("rollback", help="Rollback migration") rollback_parser.add_argument("id", help="Migration ID to rollback") rollback_parser.add_argument("--dry-run", action="store_true", help="Show what would be executed") # Status command subparsers.add_parser("status", help="Show migration status") args = parser.parse_args() # For generate, we don't need connection if args.command == "generate": manager = MigrationManager(args.db, "", args.migrations_dir) migration = manager.generate_migration(args.name, args.dry_run) sys.exit(0 if migration else 1) # Other commands need connection if not args.uri: print("Error: --uri required for this command") sys.exit(1) manager = MigrationManager(args.db, args.uri, args.migrations_dir) if not manager.connect(): sys.exit(1) try: manager._ensure_migrations_table() if args.command == "status": pending = manager.get_pending_migrations() print(f"Pending migrations: {len(pending)}") for migration in pending: print(f" {migration.id} - {migration.name}") elif args.command == "apply": pending = manager.get_pending_migrations() if not pending: print("No pending migrations") else: for migration in pending: if not manager.apply_migration(migration, args.dry_run): sys.exit(1) elif args.command == "rollback": if not manager.rollback_migration(args.id, args.dry_run): sys.exit(1) finally: manager.disconnect() if __name__ == "__main__": main() ``` ### scripts/db_backup.py ```python #!/usr/bin/env python3 """ Database backup and restore tool for MongoDB and PostgreSQL. Supports compression, scheduling, and verification. """ import argparse import gzip import json import os import shutil import subprocess import sys from dataclasses import dataclass from datetime import datetime from pathlib import Path from typing import Dict, List, Optional @dataclass class BackupInfo: """Backup metadata.""" filename: str database_type: str database_name: str timestamp: datetime size_bytes: int compressed: bool verified: bool = False class BackupManager: """Manages database backups for MongoDB and PostgreSQL.""" def __init__(self, db_type: str, backup_dir: str = "./backups"): """ Initialize backup manager. Args: db_type: Database type ('mongodb' or 'postgres') backup_dir: Directory to store backups """ self.db_type = db_type.lower() self.backup_dir = Path(backup_dir) self.backup_dir.mkdir(exist_ok=True) def create_backup( self, uri: str, database: Optional[str] = None, compress: bool = True, verify: bool = True ) -> Optional[BackupInfo]: """ Create database backup. Args: uri: Database connection string database: Database name (optional for MongoDB) compress: Compress backup file verify: Verify backup after creation Returns: BackupInfo if successful, None otherwise """ timestamp = datetime.now() date_str = timestamp.strftime("%Y%m%d_%H%M%S") if self.db_type == "mongodb": return self._backup_mongodb(uri, database, date_str, compress, verify) elif self.db_type == "postgres": return self._backup_postgres(uri, database, date_str, compress, verify) else: print(f"Error: Unsupported database type: {self.db_type}") return None def _backup_mongodb( self, uri: str, database: Optional[str], date_str: str, compress: bool, verify: bool ) -> Optional[BackupInfo]: """Create MongoDB backup using mongodump.""" db_name = database or "all" filename = f"mongodb_{db_name}_{date_str}" backup_path = self.backup_dir / filename try: cmd = ["mongodump", "--uri", uri, "--out", str(backup_path)] if database: cmd.extend(["--db", database]) print(f"Creating MongoDB backup: {filename}") result = subprocess.run(cmd, capture_output=True, text=True) if result.returncode != 0: print(f"Error: {result.stderr}") return None # Compress if requested if compress: archive_path = backup_path.with_suffix(".tar.gz") print(f"Compressing backup...") shutil.make_archive(str(backup_path), "gztar", backup_path) shutil.rmtree(backup_path) backup_path = archive_path filename = archive_path.name size_bytes = self._get_size(backup_path) backup_info = BackupInfo( filename=filename, database_type="mongodb", database_name=db_name, timestamp=datetime.now(), size_bytes=size_bytes, compressed=compress ) if verify: backup_info.verified = self._verify_backup(backup_info) self._save_metadata(backup_info) print(f"✓ Backup created: {filename} ({self._format_size(size_bytes)})") return backup_info except Exception as e: print(f"Error creating MongoDB backup: {e}") return None def _backup_postgres( self, uri: str, database: str, date_str: str, compress: bool, verify: bool ) -> Optional[BackupInfo]: """Create PostgreSQL backup using pg_dump.""" if not database: print("Error: Database name required for PostgreSQL backup") return None ext = ".sql.gz" if compress else ".sql" filename = f"postgres_{database}_{date_str}{ext}" backup_path = self.backup_dir / filename try: cmd = ["pg_dump", uri] if compress: # Use pg_dump with gzip with open(backup_path, "wb") as f: dump_proc = subprocess.Popen(cmd, stdout=subprocess.PIPE) gzip_proc = subprocess.Popen( ["gzip"], stdin=dump_proc.stdout, stdout=f ) dump_proc.stdout.close() gzip_proc.communicate() if dump_proc.returncode != 0: print("Error: pg_dump failed") return None else: with open(backup_path, "w") as f: result = subprocess.run(cmd, stdout=f, stderr=subprocess.PIPE, text=True) if result.returncode != 0: print(f"Error: {result.stderr}") return None size_bytes = backup_path.stat().st_size backup_info = BackupInfo( filename=filename, database_type="postgres", database_name=database, timestamp=datetime.now(), size_bytes=size_bytes, compressed=compress ) if verify: backup_info.verified = self._verify_backup(backup_info) self._save_metadata(backup_info) print(f"✓ Backup created: {filename} ({self._format_size(size_bytes)})") return backup_info except Exception as e: print(f"Error creating PostgreSQL backup: {e}") return None def restore_backup(self, filename: str, uri: str, dry_run: bool = False) -> bool: """ Restore database from backup. Args: filename: Backup filename uri: Database connection string dry_run: If True, only show what would be done Returns: True if successful, False otherwise """ backup_path = self.backup_dir / filename if not backup_path.exists(): print(f"Error: Backup not found: {filename}") return False # Load metadata metadata_path = backup_path.with_suffix(".json") if metadata_path.exists(): with open(metadata_path) as f: metadata = json.load(f) print(f"Restoring backup from {metadata['timestamp']}") print(f"Database: {metadata['database_name']}") if dry_run: print(f"Would restore from: {backup_path}") return True print(f"Restoring backup: {filename}") try: if self.db_type == "mongodb": return self._restore_mongodb(backup_path, uri) elif self.db_type == "postgres": return self._restore_postgres(backup_path, uri) else: print(f"Error: Unsupported database type: {self.db_type}") return False except Exception as e: print(f"Error restoring backup: {e}") return False def _restore_mongodb(self, backup_path: Path, uri: str) -> bool: """Restore MongoDB backup using mongorestore.""" try: # Extract if compressed restore_path = backup_path if backup_path.suffix == ".gz": print("Extracting backup...") extract_path = backup_path.with_suffix("") shutil.unpack_archive(backup_path, extract_path) restore_path = extract_path cmd = ["mongorestore", "--uri", uri, str(restore_path)] result = subprocess.run(cmd, capture_output=True, text=True) # Cleanup extracted files if restore_path != backup_path and restore_path.is_dir(): shutil.rmtree(restore_path) if result.returncode != 0: print(f"Error: {result.stderr}") return False print("✓ Restore completed") return True except Exception as e: print(f"Error restoring MongoDB: {e}") return False def _restore_postgres(self, backup_path: Path, uri: str) -> bool: """Restore PostgreSQL backup using psql.""" try: if backup_path.suffix == ".gz": # Decompress and restore with gzip.open(backup_path, "rb") as f: cmd = ["psql", uri] result = subprocess.run( cmd, stdin=f, capture_output=True, text=False ) else: with open(backup_path) as f: cmd = ["psql", uri] result = subprocess.run( cmd, stdin=f, capture_output=True, text=True ) if result.returncode != 0: print(f"Error: {result.stderr}") return False print("✓ Restore completed") return True except Exception as e: print(f"Error restoring PostgreSQL: {e}") return False def list_backups(self) -> List[BackupInfo]: """ List all backups. Returns: List of BackupInfo objects """ backups = [] for metadata_file in sorted(self.backup_dir.glob("*.json")): try: with open(metadata_file) as f: data = json.load(f) backup_info = BackupInfo( filename=data["filename"], database_type=data["database_type"], database_name=data["database_name"], timestamp=datetime.fromisoformat(data["timestamp"]), size_bytes=data["size_bytes"], compressed=data["compressed"], verified=data.get("verified", False) ) backups.append(backup_info) except Exception as e: print(f"Error reading metadata {metadata_file}: {e}") return backups def cleanup_old_backups(self, retention_days: int, dry_run: bool = False) -> int: """ Remove backups older than retention period. Args: retention_days: Number of days to retain backups dry_run: If True, only show what would be deleted Returns: Number of backups removed """ cutoff = datetime.now().timestamp() - (retention_days * 24 * 3600) removed = 0 for backup_file in self.backup_dir.glob("*"): if backup_file.suffix == ".json": continue if backup_file.stat().st_mtime < cutoff: if dry_run: print(f"Would remove: {backup_file.name}") else: print(f"Removing: {backup_file.name}") backup_file.unlink() # Remove metadata metadata_file = backup_file.with_suffix(".json") if metadata_file.exists(): metadata_file.unlink() removed += 1 return removed def _verify_backup(self, backup_info: BackupInfo) -> bool: """ Verify backup integrity. Args: backup_info: Backup information Returns: True if backup is valid, False otherwise """ backup_path = self.backup_dir / backup_info.filename if not backup_path.exists(): return False # Basic verification: file exists and has size > 0 if backup_path.stat().st_size == 0: return False # Could add more verification here (checksums, test restore, etc.) return True def _get_size(self, path: Path) -> int: """Get total size of file or directory.""" if path.is_file(): return path.stat().st_size elif path.is_dir(): total = 0 for item in path.rglob("*"): if item.is_file(): total += item.stat().st_size return total return 0 def _format_size(self, size_bytes: int) -> str: """Format size in human-readable format.""" for unit in ["B", "KB", "MB", "GB", "TB"]: if size_bytes < 1024: return f"{size_bytes:.2f} {unit}" size_bytes /= 1024 return f"{size_bytes:.2f} PB" def _save_metadata(self, backup_info: BackupInfo): """Save backup metadata to JSON file.""" metadata_path = self.backup_dir / f"{backup_info.filename}.json" metadata = { "filename": backup_info.filename, "database_type": backup_info.database_type, "database_name": backup_info.database_name, "timestamp": backup_info.timestamp.isoformat(), "size_bytes": backup_info.size_bytes, "compressed": backup_info.compressed, "verified": backup_info.verified } with open(metadata_path, "w") as f: json.dump(metadata, f, indent=2) def main(): """Main entry point.""" parser = argparse.ArgumentParser(description="Database backup tool") parser.add_argument("--db", required=True, choices=["mongodb", "postgres"], help="Database type") parser.add_argument("--backup-dir", default="./backups", help="Backup directory") subparsers = parser.add_subparsers(dest="command", required=True) # Backup command backup_parser = subparsers.add_parser("backup", help="Create backup") backup_parser.add_argument("--uri", required=True, help="Database connection string") backup_parser.add_argument("--database", help="Database name") backup_parser.add_argument("--no-compress", action="store_true", help="Disable compression") backup_parser.add_argument("--no-verify", action="store_true", help="Skip verification") # Restore command restore_parser = subparsers.add_parser("restore", help="Restore backup") restore_parser.add_argument("filename", help="Backup filename") restore_parser.add_argument("--uri", required=True, help="Database connection string") restore_parser.add_argument("--dry-run", action="store_true", help="Show what would be done") # List command subparsers.add_parser("list", help="List backups") # Cleanup command cleanup_parser = subparsers.add_parser("cleanup", help="Remove old backups") cleanup_parser.add_argument("--retention-days", type=int, default=7, help="Days to retain backups (default: 7)") cleanup_parser.add_argument("--dry-run", action="store_true", help="Show what would be removed") args = parser.parse_args() manager = BackupManager(args.db, args.backup_dir) if args.command == "backup": backup_info = manager.create_backup( args.uri, args.database, compress=not args.no_compress, verify=not args.no_verify ) sys.exit(0 if backup_info else 1) elif args.command == "restore": success = manager.restore_backup(args.filename, args.uri, args.dry_run) sys.exit(0 if success else 1) elif args.command == "list": backups = manager.list_backups() print(f"Total backups: {len(backups)}\n") for backup in backups: verified_str = "✓" if backup.verified else "?" print(f"[{verified_str}] {backup.filename}") print(f" Database: {backup.database_name}") print(f" Created: {backup.timestamp}") print(f" Size: {manager._format_size(backup.size_bytes)}") print() elif args.command == "cleanup": removed = manager.cleanup_old_backups(args.retention_days, args.dry_run) print(f"Removed {removed} backup(s)") if __name__ == "__main__": main() ``` ### scripts/db_performance_check.py ```python #!/usr/bin/env python3 """ Database performance analysis tool for MongoDB and PostgreSQL. Analyzes slow queries, recommends indexes, and generates reports. """ import argparse import json import sys from dataclasses import dataclass, asdict from datetime import datetime from typing import Dict, List, Optional try: from pymongo import MongoClient MONGO_AVAILABLE = True except ImportError: MONGO_AVAILABLE = False try: import psycopg2 from psycopg2.extras import RealDictCursor POSTGRES_AVAILABLE = True except ImportError: POSTGRES_AVAILABLE = False @dataclass class SlowQuery: """Represents a slow query.""" query: str execution_time_ms: float count: int collection_or_table: Optional[str] = None index_used: Optional[str] = None @dataclass class IndexRecommendation: """Index recommendation.""" collection_or_table: str fields: List[str] reason: str estimated_benefit: str @dataclass class PerformanceReport: """Performance analysis report.""" database_type: str database_name: str timestamp: datetime slow_queries: List[SlowQuery] index_recommendations: List[IndexRecommendation] database_metrics: Dict[str, any] class PerformanceAnalyzer: """Analyzes database performance.""" def __init__(self, db_type: str, connection_string: str, threshold_ms: int = 100): """ Initialize performance analyzer. Args: db_type: Database type ('mongodb' or 'postgres') connection_string: Database connection string threshold_ms: Slow query threshold in milliseconds """ self.db_type = db_type.lower() self.connection_string = connection_string self.threshold_ms = threshold_ms self.client = None self.db = None self.conn = None def connect(self) -> bool: """Connect to database.""" try: if self.db_type == "mongodb": if not MONGO_AVAILABLE: print("Error: pymongo not installed") return False self.client = MongoClient(self.connection_string) self.db = self.client.get_default_database() self.client.server_info() return True elif self.db_type == "postgres": if not POSTGRES_AVAILABLE: print("Error: psycopg2 not installed") return False self.conn = psycopg2.connect(self.connection_string) return True else: print(f"Error: Unsupported database type: {self.db_type}") return False except Exception as e: print(f"Connection error: {e}") return False def disconnect(self): """Disconnect from database.""" try: if self.client: self.client.close() if self.conn: self.conn.close() except Exception as e: print(f"Disconnect error: {e}") def analyze(self) -> Optional[PerformanceReport]: """ Analyze database performance. Returns: PerformanceReport if successful, None otherwise """ try: if self.db_type == "mongodb": return self._analyze_mongodb() elif self.db_type == "postgres": return self._analyze_postgres() else: return None except Exception as e: print(f"Analysis error: {e}") return None def _analyze_mongodb(self) -> PerformanceReport: """Analyze MongoDB performance.""" slow_queries = [] index_recommendations = [] # Enable profiling if not enabled profiling_level = self.db.command("profile", -1) if profiling_level.get("was", 0) == 0: self.db.command("profile", 1, slowms=self.threshold_ms) # Get slow queries from system.profile for doc in self.db.system.profile.find( {"millis": {"$gte": self.threshold_ms}}, limit=50 ).sort("millis", -1): query_str = json.dumps(doc.get("command", {}), default=str) slow_queries.append(SlowQuery( query=query_str, execution_time_ms=doc.get("millis", 0), count=1, collection_or_table=doc.get("ns", "").split(".")[-1] if "ns" in doc else None, index_used=doc.get("planSummary") )) # Analyze collections for index recommendations for coll_name in self.db.list_collection_names(): if coll_name.startswith("system."): continue coll = self.db[coll_name] # Check for collections scans stats = coll.aggregate([ {"$collStats": {"storageStats": {}}} ]).next() # Check if collection has indexes indexes = list(coll.list_indexes()) if len(indexes) <= 1: # Only _id index # Recommend indexes based on common patterns # Sample documents to find frequently queried fields sample = list(coll.find().limit(100)) if sample: # Find fields that appear in most documents field_freq = {} for doc in sample: for field in doc.keys(): if field != "_id": field_freq[field] = field_freq.get(field, 0) + 1 # Recommend index on most common field if field_freq: top_field = max(field_freq.items(), key=lambda x: x[1])[0] index_recommendations.append(IndexRecommendation( collection_or_table=coll_name, fields=[top_field], reason="Frequently queried field without index", estimated_benefit="High" )) # Get database metrics server_status = self.client.admin.command("serverStatus") db_stats = self.db.command("dbStats") metrics = { "connections": server_status.get("connections", {}).get("current", 0), "operations_per_sec": server_status.get("opcounters", {}).get("query", 0), "database_size_mb": db_stats.get("dataSize", 0) / (1024 * 1024), "index_size_mb": db_stats.get("indexSize", 0) / (1024 * 1024), "collections": db_stats.get("collections", 0) } return PerformanceReport( database_type="mongodb", database_name=self.db.name, timestamp=datetime.now(), slow_queries=slow_queries[:10], # Top 10 index_recommendations=index_recommendations, database_metrics=metrics ) def _analyze_postgres(self) -> PerformanceReport: """Analyze PostgreSQL performance.""" slow_queries = [] index_recommendations = [] with self.conn.cursor(cursor_factory=RealDictCursor) as cur: # Check if pg_stat_statements extension is available cur.execute(""" SELECT EXISTS ( SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements' ) AS has_extension """) has_pg_stat_statements = cur.fetchone()["has_extension"] if has_pg_stat_statements: # Get slow queries from pg_stat_statements cur.execute(""" SELECT query, mean_exec_time, calls, total_exec_time FROM pg_stat_statements WHERE mean_exec_time >= %s ORDER BY mean_exec_time DESC LIMIT 10 """, (self.threshold_ms,)) for row in cur.fetchall(): slow_queries.append(SlowQuery( query=row["query"], execution_time_ms=row["mean_exec_time"], count=row["calls"] )) # Find tables with sequential scans (potential index candidates) cur.execute(""" SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan FROM pg_stat_user_tables WHERE seq_scan > 1000 AND (idx_scan IS NULL OR seq_scan > idx_scan * 2) ORDER BY seq_tup_read DESC LIMIT 10 """) for row in cur.fetchall(): index_recommendations.append(IndexRecommendation( collection_or_table=f"{row['schemaname']}.{row['tablename']}", fields=["<analyze query patterns>"], reason=f"High sequential scans ({row['seq_scan']}) vs index scans ({row['idx_scan'] or 0})", estimated_benefit="High" if row["seq_tup_read"] > 100000 else "Medium" )) # Find unused indexes cur.execute(""" SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey' ORDER BY pg_relation_size(indexrelid) DESC """) unused_indexes = [] for row in cur.fetchall(): unused_indexes.append( f"{row['schemaname']}.{row['tablename']}.{row['indexname']}" ) # Database metrics cur.execute(""" SELECT sum(numbackends) AS connections, sum(xact_commit) AS commits, sum(xact_rollback) AS rollbacks FROM pg_stat_database WHERE datname = current_database() """) stats = cur.fetchone() cur.execute(""" SELECT pg_database_size(current_database()) AS db_size """) db_size = cur.fetchone()["db_size"] cur.execute(""" SELECT sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS cache_hit_ratio FROM pg_statio_user_tables """) cache_ratio = cur.fetchone()["cache_hit_ratio"] or 0 metrics = { "connections": stats["connections"], "commits": stats["commits"], "rollbacks": stats["rollbacks"], "database_size_mb": db_size / (1024 * 1024), "cache_hit_ratio": float(cache_ratio), "unused_indexes": unused_indexes } return PerformanceReport( database_type="postgres", database_name=self.conn.info.dbname, timestamp=datetime.now(), slow_queries=slow_queries, index_recommendations=index_recommendations, database_metrics=metrics ) def print_report(self, report: PerformanceReport): """Print performance report.""" print("=" * 80) print(f"Database Performance Report - {report.database_type.upper()}") print(f"Database: {report.database_name}") print(f"Timestamp: {report.timestamp}") print("=" * 80) print("\n## Database Metrics") print("-" * 80) for key, value in report.database_metrics.items(): if isinstance(value, float): print(f"{key}: {value:.2f}") else: print(f"{key}: {value}") print("\n## Slow Queries") print("-" * 80) if report.slow_queries: for i, query in enumerate(report.slow_queries, 1): print(f"\n{i}. Execution Time: {query.execution_time_ms:.2f}ms | Count: {query.count}") if query.collection_or_table: print(f" Collection/Table: {query.collection_or_table}") if query.index_used: print(f" Index Used: {query.index_used}") print(f" Query: {query.query[:200]}...") else: print("No slow queries found") print("\n## Index Recommendations") print("-" * 80) if report.index_recommendations: for i, rec in enumerate(report.index_recommendations, 1): print(f"\n{i}. {rec.collection_or_table}") print(f" Fields: {', '.join(rec.fields)}") print(f" Reason: {rec.reason}") print(f" Estimated Benefit: {rec.estimated_benefit}") if report.database_type == "mongodb": index_spec = {field: 1 for field in rec.fields} print(f" Command: db.{rec.collection_or_table}.createIndex({json.dumps(index_spec)})") elif report.database_type == "postgres": fields_str = ", ".join(rec.fields) print(f" Command: CREATE INDEX idx_{rec.collection_or_table.replace('.', '_')}_{rec.fields[0]} ON {rec.collection_or_table}({fields_str});") else: print("No index recommendations") print("\n" + "=" * 80) def save_report(self, report: PerformanceReport, filename: str): """Save report to JSON file.""" # Convert dataclasses to dict report_dict = { "database_type": report.database_type, "database_name": report.database_name, "timestamp": report.timestamp.isoformat(), "slow_queries": [asdict(q) for q in report.slow_queries], "index_recommendations": [asdict(r) for r in report.index_recommendations], "database_metrics": report.database_metrics } with open(filename, "w") as f: json.dump(report_dict, f, indent=2, default=str) print(f"\nReport saved to: {filename}") def main(): """Main entry point.""" parser = argparse.ArgumentParser(description="Database performance analysis tool") parser.add_argument("--db", required=True, choices=["mongodb", "postgres"], help="Database type") parser.add_argument("--uri", required=True, help="Database connection string") parser.add_argument("--threshold", type=int, default=100, help="Slow query threshold in milliseconds (default: 100)") parser.add_argument("--output", help="Save report to JSON file") args = parser.parse_args() analyzer = PerformanceAnalyzer(args.db, args.uri, args.threshold) if not analyzer.connect(): sys.exit(1) try: print(f"Analyzing {args.db} performance (threshold: {args.threshold}ms)...") report = analyzer.analyze() if report: analyzer.print_report(report) if args.output: analyzer.save_report(report, args.output) sys.exit(0) else: print("Analysis failed") sys.exit(1) finally: analyzer.disconnect() if __name__ == "__main__": main() ```