Skip to main content

Document Database Implementation

NoSQL document database selection and implementation for flexible schema applications across Python, TypeScript, Rust, and Go.

When to Use

Use document databases when applications need:

  • Flexible schemas - Data models evolve rapidly without migrations
  • Nested structures - JSON-like hierarchical data
  • Horizontal scaling - Built-in sharding and replication
  • Developer velocity - Object-to-database mapping without ORM complexity

Multi-Language Support

This skill provides patterns for:

  • Python: MongoDB (motor, pymongo), DynamoDB (boto3), Firestore (firebase-admin)
  • TypeScript: MongoDB (mongodb), DynamoDB (@aws-sdk), Firestore (firebase)
  • Rust: MongoDB (mongodb 2.8), DynamoDB (aws-sdk-dynamodb)
  • Go: MongoDB (mongo-driver), DynamoDB (aws-sdk-go-v2)

Database Selection

Quick Decision Framework

DEPLOYMENT ENVIRONMENT?
├── AWS-Native Application → DynamoDB
│ ✓ Serverless, auto-scaling, single-digit ms latency
│ ✗ Limited query flexibility

├── Firebase/GCP Ecosystem → Firestore
│ ✓ Real-time sync, offline support, mobile-first
│ ✗ More expensive for heavy reads

└── General-Purpose/Complex Queries → MongoDB
✓ Rich aggregation, full-text search, vector search
✓ ACID transactions, self-hosted or managed

Database Comparison

DatabaseBest ForLatencyMax ItemQuery Language
MongoDBGeneral-purpose, complex queries1-5ms16MBMQL (rich)
DynamoDBAWS serverless, predictable performance<10ms400KBPartiQL (limited)
FirestoreReal-time apps, mobile-first50-200ms1MBFirebase queries

Schema Design Patterns

Embedding vs Referencing

Quick guide:

RelationshipPatternExample
One-to-FewEmbedUser addresses (2-3 max)
One-to-ManyHybridBlog posts → comments
One-to-MillionsReferenceUser → events (logging)
Many-to-ManyReferenceProducts ↔ Categories

Embedding Example (MongoDB)

// User with embedded addresses
{
_id: ObjectId("..."),
email: "user@example.com",
name: "Jane Doe",
addresses: [
{
type: "home",
street: "123 Main St",
city: "Boston",
default: true
}
],
preferences: {
theme: "dark",
notifications: { email: true, sms: false }
}
}

Referencing Example (E-commerce)

// Orders reference products
{
_id: ObjectId("..."),
userId: ObjectId("..."),
items: [
{
productId: ObjectId("..."), // Reference
priceAtPurchase: 49.99, // Denormalize (historical)
quantity: 2
}
],
totalAmount: 99.98
}

When to denormalize:

  • Frequently read together
  • Historical snapshots (prices, names)
  • Read-heavy workloads

Indexing Strategies

MongoDB Index Types

// 1. Single field (unique email)
db.users.createIndex({ email: 1 }, { unique: true })

// 2. Compound index (ORDER MATTERS!)
db.orders.createIndex({ status: 1, createdAt: -1 })

// 3. Partial index (index subset)
db.orders.createIndex(
{ userId: 1 },
{ partialFilterExpression: { status: { $eq: "pending" }}}
)

// 4. TTL index (auto-delete after 30 days)
db.sessions.createIndex(
{ createdAt: 1 },
{ expireAfterSeconds: 2592000 }
)

// 5. Text index (full-text search)
db.articles.createIndex({
title: "text",
content: "text"
})

Index Best Practices:

  • Add indexes for all query filters
  • Compound index order: Equality → Range → Sort
  • Use covering indexes (query + projection in index)
  • Use explain() to verify index usage
  • Monitor with Performance Advisor (Atlas)

MongoDB Aggregation Pipelines

Key Operators: $match (filter), $group (aggregate), $lookup (join), $unwind (arrays), $project (reshape)

Example aggregation:

db.orders.aggregate([
{ $match: { status: "completed", createdAt: { $gte: new Date("2025-01-01") } } },
{ $group: {
_id: "$userId",
totalOrders: { $sum: 1 },
totalRevenue: { $sum: "$totalAmount" }
}},
{ $sort: { totalRevenue: -1 } },
{ $limit: 10 }
])

DynamoDB Single-Table Design

Design for access patterns using PK/SK patterns. Store multiple entity types in one table with composite keys.

Example:

PK                    SK                    Attributes
USER#alice #METADATA {name: "Alice", email: "..."}
USER#alice ORDER#2025-01-15 {orderId: "...", total: 99.98}
USER#alice SESSION#abc123 {expires: "..."}

Firestore Real-Time Patterns

Use onSnapshot() for real-time listeners:

const unsubscribe = onSnapshot(
collection(db, "messages"),
(snapshot) => {
snapshot.docChanges().forEach((change) => {
if (change.type === "added") {
console.log("New message:", change.doc.data())
}
})
}
)

Performance Optimization

Key practices:

  • Always use indexes for query filters (verify with .explain())
  • Use connection pooling (reuse clients across requests)
  • Avoid collection scans in production
  • Implement pagination for large result sets
  • Use transactions for multi-statement operations

Frontend Integration

  • Forms skill: Form submission → API validation → Database CRUD (INSERT/UPDATE)
  • Tables skill: Paginated queries → API → Table display with sorting/filtering
  • Media skill: MongoDB GridFS for large file storage with metadata
  • AI Chat skill: MongoDB Atlas Vector Search for semantic conversation retrieval
  • Feedback skill: DynamoDB for high-throughput event logging with TTL

Common Patterns

Pagination: Use cursor-based pagination for large datasets (recommended over offset)

// MongoDB cursor pagination
db.items.find({ _id: { $gt: lastSeenId } }).limit(20)

Soft Deletes: Mark as deleted with timestamp instead of removing

db.users.updateOne(
{ _id: userId },
{ $set: { deletedAt: new Date(), isDeleted: true } }
)

Audit Logs: Store version history within documents

{
_id: ObjectId("..."),
currentVersion: 3,
data: { /* current data */ },
history: [
{ version: 1, data: { /* v1 */ }, updatedAt: "..." },
{ version: 2, data: { /* v2 */ }, updatedAt: "..." }
]
}

Anti-Patterns to Avoid

Unbounded Arrays: Limit embedded arrays (use references for large collections) Over-Indexing: Only index queried fields (indexes slow writes) DynamoDB Scans: Always use Query with partition key (avoid Scan)

References