Skip to main content

Designing Distributed Systems

Design scalable, reliable, and fault-tolerant distributed systems using proven patterns and consistency models. Covers CAP/PACELC theorems, replication strategies, partitioning, and resilience patterns.

When to Use

Use when:

  • Designing microservices architectures with multiple services
  • Building systems that must scale across multiple datacenters or regions
  • Choosing between consistency vs availability during network partitions
  • Selecting replication strategies (single-leader, multi-leader, leaderless)
  • Implementing distributed transactions (saga pattern, event sourcing, CQRS)
  • Designing partition-tolerant systems with proper consistency guarantees
  • Building resilient services with circuit breakers, bulkheads, retries

Core Concepts

CAP Theorem Fundamentals

CAP Theorem: In a distributed system experiencing a network partition, choose between Consistency (C) or Availability (A). Partition tolerance (P) is mandatory.

Network partitions WILL occur → Always design for P

During partition:
├─ CP (Consistency + Partition Tolerance)
│ Use when: Financial transactions, inventory, seat booking
│ Trade-off: System unavailable during partition
│ Examples: HBase, MongoDB (default), etcd

└─ AP (Availability + Partition Tolerance)
Use when: Social media, caching, analytics, shopping carts
Trade-off: Stale reads possible, conflicts need resolution
Examples: Cassandra, DynamoDB, Riak

PACELC: Extends CAP to consider normal operations (no partition).

  • If Partition: Choose Availability (A) or Consistency (C)
  • Else (normal): Choose Latency (L) or Consistency (C)

Consistency Models Spectrum

Strong Consistency ◄─────────────────────► Eventual Consistency
│ │ │
Linearizable Causal Consistency Convergent
(Slowest, (Middle Ground, (Fastest,
Most Consistent) Causally Ordered) Eventually Consistent)

Strong Consistency (Linearizability):

  • All operations appear atomically in sequential order
  • Reads always return most recent write
  • Use for: Bank balances, inventory stock, seat booking
  • Trade-off: Higher latency, reduced availability

Eventual Consistency:

  • If no new updates, all replicas eventually converge
  • Use for: Social feeds, product catalogs, user profiles, DNS
  • Trade-off: Stale reads possible, conflict resolution needed

Causal Consistency:

  • Causally related operations seen in same order by all nodes
  • Use for: Chat apps, collaborative editing, comment threads
  • Trade-off: More complex than eventual, requires causality tracking

Bounded Staleness:

  • Staleness bounded by time or version count
  • Use for: Real-time dashboards, leaderboards, monitoring
  • Trade-off: Must monitor lag, more complex than eventual

Replication Patterns

1. Leader-Follower (Single-Leader):

  • All writes to leader, replicated to followers
  • Followers handle reads (load distribution)
  • Synchronous: Wait for follower ACK (strong consistency, higher latency)
  • Asynchronous: Don't wait (eventual consistency, possible data loss)
  • Use for: Most common pattern, strong consistency with sync replication

2. Multi-Leader:

  • Multiple leaders accept writes in different datacenters
  • Leaders replicate to each other
  • Conflict resolution required: Last-Write-Wins, application merge, vector clocks
  • Use for: Multi-datacenter, low write latency, geo-distributed users
  • Trade-off: Conflict resolution complexity

3. Leaderless (Dynamo-style):

  • No single leader, quorum-based reads/writes
  • Quorum rule: W + R > N (W=write quorum, R=read quorum, N=replicas)
  • Example: N=5, W=3, R=2 → Strong consistency (overlap guaranteed)
  • Use for: Maximum availability, partition tolerance
  • Trade-off: Complexity, read repair needed

Partitioning Strategies

Hash Partitioning (Consistent Hashing):

  • Key → Hash(Key) → Partition assignment
  • Even distribution, minimal rebalancing when nodes added/removed
  • Use for: Point queries by ID, even distribution critical
  • Examples: Cassandra, DynamoDB, Redis Cluster

Range Partitioning:

  • Key ranges assigned to partitions (A-F, G-M, N-S, T-Z)
  • Enables range queries, ordered data
  • Risk: Hot spots if data skewed
  • Use for: Time-series data, leaderboards, range scans
  • Examples: HBase, Bigtable

Geographic Partitioning:

  • Partition by location (US-East, EU-West, APAC)
  • Use for: Data locality, GDPR compliance, low latency
  • Examples: Spanner, Cosmos DB

Resilience Patterns

Circuit Breaker:

[Closed] → Normal operation
│ (failures exceed threshold)

[Open] → Fail fast (don't call failing service)
│ (timeout expires)

[Half-Open] → Try single request
│ success → [Closed]
│ failure → [Open]
  • Prevents cascading failures
  • Fast-fail instead of waiting for timeout

Bulkhead Isolation:

  • Isolate resources (thread pools, connection pools)
  • Failure in one partition doesn't affect others
  • Like ship compartments preventing total flooding

Timeout and Retry:

  • Timeout: Set deadlines, fail fast if exceeded
  • Retry: Exponential backoff with jitter
  • Idempotency: Ensure safe retry (critical)

Rate Limiting and Backpressure:

  • Protect services from overload
  • Token bucket, leaky bucket algorithms
  • Backpressure: Signal upstream to slow down

Transaction Patterns

Saga Pattern:

  • Coordinate distributed transactions across services
  • No distributed 2PC (two-phase commit)

Choreography: Services react to events

Order Service → OrderCreated event
Payment Service → listens → PaymentProcessed event
Inventory Service → listens → InventoryReserved event
(Compensating: if payment fails → InventoryReleased event)

Orchestration: Central coordinator

Saga Orchestrator:
1. Call Order Service
2. Call Payment Service
3. Call Inventory Service
(If step fails → call compensating transactions in reverse)

Event Sourcing:

  • Store state changes as immutable events
  • Rebuild state by replaying events
  • Audit trail, time travel, debugging
  • Trade-off: Query complexity, snapshot optimization

CQRS (Command Query Responsibility Segregation):

  • Separate read and write models
  • Write model: Normalized, transactional
  • Read model: Denormalized, cached, optimized
  • Use for: Different read/write patterns, high read:write ratio (10:1+)
  • Often paired with Event Sourcing

Decision Frameworks

Choosing Consistency Model

Decision Tree:
├─ Money involved? → Strong Consistency
├─ Double-booking unacceptable? → Strong Consistency
├─ Causality important (chat, edits)? → Causal Consistency
├─ Read-heavy, stale tolerable? → Eventual Consistency
└─ Default? → Eventual (then strengthen if needed)

Choosing Replication Pattern

├─ Single region writes? → Leader-Follower
├─ Multi-region writes + conflicts OK? → Multi-Leader
├─ Multi-region writes + no conflicts? → Leader-Follower with failover
└─ Maximum availability? → Leaderless (quorum)

Choosing Partitioning Strategy

├─ Need range scans? → Range Partitioning (risk: hot spots)
├─ Data residency requirements? → Geographic Partitioning
└─ Default? → Hash Partitioning (consistent hashing)

Quick Reference Tables

CAP/PACELC System Comparison

SystemIf PartitionElse (Normal)Use Case
SpannerPCEC (strong)Global SQL
DynamoDBPAEL (eventual)High availability
CassandraPAEL (tunable)Wide-column store
MongoDBPCEC (default)Document store
Cosmos DBPA/PCEL/EC (5 levels)Multi-model

Consistency Model Use Cases

Use CaseConsistency Model
Bank account balanceStrong (Linearizable)
Seat booking (airline)Strong (Linearizable)
Inventory stock countStrong or Bounded
Shopping cartEventual
Product catalogEventual
Collaborative editingCausal
Chat messagesCausal
Social media likesEventual
DNS recordsEventual

Quorum Configurations

ConfigurationWRNConsistencyUse Case
Strong335StrongBanking
Balanced325StrongDefault
Write-heavy235StrongLogs
Read-heavy315EventualCache
Max Avail115EventualAnalytics

Best Practices

Design for Failure:

  • Network partitions will occur - always design for partition tolerance
  • Use timeouts, retries with exponential backoff
  • Implement circuit breakers to prevent cascading failures
  • Test chaos engineering scenarios (partition nodes, inject latency)

Choose Consistency Carefully:

  • Default to eventual consistency, strengthen only where needed
  • Strong consistency has real costs (latency, availability)
  • Use bounded staleness for middle ground

Idempotency is Critical:

  • Design operations to be safely retryable
  • Use unique request IDs for deduplication
  • Essential for saga compensating transactions

Monitor and Observe:

  • Distributed tracing with correlation IDs
  • Monitor replication lag, quorum health
  • Alert on circuit breaker state changes
  • Track saga progress and failures

References

  • Full Skill Documentation
  • CAP/PACELC Theorem: references/cap-pacelc-theorem.md
  • Consistency Models: references/consistency-models.md
  • Replication Patterns: references/replication-patterns.md
  • Partitioning Strategies: references/partitioning-strategies.md
  • Resilience Patterns: references/resilience-patterns.md
  • Saga Pattern: references/saga-pattern.md
  • Event Sourcing/CQRS: references/event-sourcing-cqrs.md