Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Replication Strategies

Copies of data that keep your system alive

Replication means keeping copies of the same data on multiple machines. Think of it like having backup copies of important documents in different locations—if one is destroyed, you still have access.

Diagram
GoalHow Replication Helps
High AvailabilityIf one node dies, others continue serving
Read PerformanceDistribute reads across replicas
LatencyPlace replicas closer to users geographically
Disaster RecoveryReplicas in different data centers survive regional failures

Strategy 1: Leader-Follower (Primary-Replica)

Section titled “Strategy 1: Leader-Follower (Primary-Replica)”

The most common replication strategy. One leader handles all writes; followers replicate from the leader and serve reads.

Diagram
  1. Client sends a write to the leader
  2. Leader persists the data locally
  3. Leader sends data to all followers
  4. Followers apply the changes to their copies
  5. Reads can go to any follower (or the leader)

The critical question: Does the leader wait for followers before confirming the write?

Diagram

How it works: Leader waits until ALL (or some) followers confirm they received the data before telling the client “write successful.”

Like: Sending a registered letter—you wait for delivery confirmation.

Trade-off: If any follower is slow or down, the entire write is delayed or blocked.

How it works: Leader immediately tells client “write successful,” then sends data to followers in the background.

Like: Sending a regular letter—you drop it in the mailbox and assume it’ll arrive.

Trade-off: If the leader crashes before replication completes, those writes are lost forever.


Multiple nodes can accept writes. Each leader replicates to others. This is common for geo-distributed systems.

Diagram
  • Multi-datacenter deployment — Users in US write to US leader, EU users write to EU leader
  • Offline clients — Mobile apps that work offline (each device is a “leader”)
  • Collaborative editing — Google Docs-style real-time collaboration

What happens when two leaders accept conflicting writes at the same time?

Imagine this scenario:

  1. User A in the US changes their username to “alice_new”
  2. User B in the EU changes the same username to “alice_updated”
  3. Both leaders accept the write locally
  4. When they sync… which one wins?
Diagram
StrategyHow It WorksBest For
Last-Write-Wins (LWW)Most recent timestamp wins; older write is discardedSimple data, acceptable to lose updates
First-Write-WinsFirst timestamp wins; reject later writesImmutable records
MergeCombine both values using domain logicShopping carts, sets, counters
Custom/User ResolutionStore both, let app or user decideDocuments, complex data

No designated leader — any node can accept reads and writes. Used by systems like Cassandra, DynamoDB, and Riak.

Diagram

The key concept is quorum — a voting system for consistency:

  • N = Total number of replicas
  • W = Number of nodes that must confirm a write
  • R = Number of nodes that must respond to a read

The Rule: If W + R > N, you’re guaranteed to read at least one node with the latest data.

Diagram
ConfigWRTrade-off
Balanced22Good consistency + availability
Write-heavy13Fast writes, slower reads
Read-heavy31Slow writes, fast reads

With async replication, followers may be behind. This creates read consistency challenges:

Diagram
LevelWhat It GuaranteesHow To Achieve
EventualData will sync “eventually”Read from any replica
Read-Your-WritesSee your own writes immediatelyRead from leader after write, or track write timestamps
Monotonic ReadsNever see older data than beforeStick to same replica, or track read positions
Strong/LinearizableAlways see latestRead from leader only

AspectLeader-FollowerMulti-LeaderLeaderless
Write ScalabilityLimited (1 leader)Good (multiple leaders)Best (any node)
Read ScalabilityGood (add followers)GoodGood
ConsistencyEasier to achieveConflict resolution neededQuorum-based
Latency (geo)High (single leader)Low (local leaders)Low
ComplexitySimplestComplex (conflicts)Complex (quorums)
ExamplesMySQL, PostgreSQLCouchDB, Google DocsCassandra, DynamoDB

Example 1: MySQL Leader-Follower Replication (Synchronous)

Section titled “Example 1: MySQL Leader-Follower Replication (Synchronous)”

Company: Facebook, GitHub, WordPress

Scenario: MySQL uses leader-follower replication where one primary database handles all writes and multiple replica databases handle reads. This provides high availability and read scalability.

Implementation: Uses semi-synchronous replication by default:

Diagram

Why Leader-Follower?

  • Simplicity: Single source of truth for writes
  • Consistency: Easier to maintain strong consistency
  • Read Scalability: Add replicas to scale reads horizontally
  • Failover: Automatic promotion of replica to primary on failure

Real-World Impact:

  • Facebook: Uses MySQL with leader-follower for billions of users
  • GitHub: MySQL replicas handle read traffic, reducing primary load
  • Performance: Read queries distributed across replicas, 10x read capacity

Example 2: Google Docs Multi-Leader Replication

Section titled “Example 2: Google Docs Multi-Leader Replication”

Company: Google

Scenario: Google Docs allows multiple users to edit the same document simultaneously. Each user’s browser acts as a “leader” that can accept writes, and changes are synchronized across all clients.

Implementation: Uses multi-leader replication with operational transformation:

Diagram

Why Multi-Leader?

  • Low Latency: Users write to nearest server
  • Offline Support: Works offline, syncs when online
  • Collaboration: Multiple simultaneous editors
  • Conflict Resolution: Operational transformation merges edits

Real-World Impact:

  • Scale: Supports 50+ simultaneous editors per document
  • Latency: < 100ms write latency (local server)
  • Consistency: All users see same document within seconds

Example 3: Amazon DynamoDB Leaderless Replication

Section titled “Example 3: Amazon DynamoDB Leaderless Replication”

Company: Amazon

Scenario: DynamoDB is a NoSQL database that uses leaderless replication. Any node can accept reads and writes, providing high availability and low latency.

Implementation: Uses quorum-based leaderless replication:

Diagram

Why Leaderless?

  • No Single Point of Failure: No leader to fail
  • High Availability: System works even if nodes fail
  • Low Latency: Write to nearest nodes
  • Scalability: Add nodes to increase capacity

Real-World Impact:

  • Scale: Handles millions of requests per second
  • Availability: 99.99% uptime SLA
  • Latency: Single-digit millisecond latency
  • Durability: 99.999999999% (11 nines) durability

Example 4: PostgreSQL Streaming Replication (Asynchronous)

Section titled “Example 4: PostgreSQL Streaming Replication (Asynchronous)”

Company: Instagram, Spotify, Reddit

Scenario: PostgreSQL uses streaming replication where the primary database streams WAL (Write-Ahead Log) records to replica databases asynchronously.

Implementation: Uses asynchronous streaming replication:

Diagram

Why Asynchronous?

  • Performance: Primary doesn’t wait for replicas
  • Availability: Primary stays available even if replicas slow
  • Scalability: Can add many replicas without impacting primary
  • Trade-off: Small risk of data loss if primary fails before replication

Real-World Impact:

  • Instagram: Uses PostgreSQL async replication for billions of photos
  • Spotify: PostgreSQL replicas handle read traffic
  • Replication Lag: Typically < 1 second lag
  • Failover: < 30 seconds automatic failover

Read-After-Write Consistency Implementation

Section titled “Read-After-Write Consistency Implementation”

Ensure users see their own writes immediately, even with async replication:

Replication ConceptLLD Implementation
Sync ReplicationWrite-through cache pattern, blocking writes
Async ReplicationWrite-behind pattern, background jobs, message queues
Conflict ResolutionStrategy pattern for merge logic, domain-specific resolvers
Read ConsistencyStrategy pattern for read routing, client-side timestamp tracking
FailoverObserver pattern for leader changes, Circuit breaker for failed replicas


Now that you understand how data is replicated, let’s learn about building systems that survive failures:

Next up: Fault Tolerance & Redundancy - Learn to design systems that work even when things break.