Replication Strategies
What is Replication?
Section titled “What is Replication?”Replication means keeping copies of the same data on multiple machines. Think of it like having backup copies of important documents in different locations—if one is destroyed, you still have access.
Why Replicate?
Section titled “Why Replicate?”| Goal | How Replication Helps |
|---|---|
| High Availability | If one node dies, others continue serving |
| Read Performance | Distribute reads across replicas |
| Latency | Place replicas closer to users geographically |
| Disaster Recovery | Replicas in different data centers survive regional failures |
Strategy 1: Leader-Follower (Primary-Replica)
Section titled “Strategy 1: Leader-Follower (Primary-Replica)”The most common replication strategy. One leader handles all writes; followers replicate from the leader and serve reads.
How It Works
Section titled “How It Works”- Client sends a write to the leader
- Leader persists the data locally
- Leader sends data to all followers
- Followers apply the changes to their copies
- Reads can go to any follower (or the leader)
Synchronous vs Asynchronous Replication
Section titled “Synchronous vs Asynchronous Replication”The critical question: Does the leader wait for followers before confirming the write?
Synchronous: The “Safe” Option
Section titled “Synchronous: The “Safe” Option”How it works: Leader waits until ALL (or some) followers confirm they received the data before telling the client “write successful.”
Like: Sending a registered letter—you wait for delivery confirmation.
Trade-off: If any follower is slow or down, the entire write is delayed or blocked.
Asynchronous: The “Fast” Option
Section titled “Asynchronous: The “Fast” Option”How it works: Leader immediately tells client “write successful,” then sends data to followers in the background.
Like: Sending a regular letter—you drop it in the mailbox and assume it’ll arrive.
Trade-off: If the leader crashes before replication completes, those writes are lost forever.
Strategy 2: Multi-Leader Replication
Section titled “Strategy 2: Multi-Leader Replication”Multiple nodes can accept writes. Each leader replicates to others. This is common for geo-distributed systems.
When to Use Multi-Leader
Section titled “When to Use Multi-Leader”- Multi-datacenter deployment — Users in US write to US leader, EU users write to EU leader
- Offline clients — Mobile apps that work offline (each device is a “leader”)
- Collaborative editing — Google Docs-style real-time collaboration
The Conflict Problem
Section titled “The Conflict Problem”What happens when two leaders accept conflicting writes at the same time?
Imagine this scenario:
- User A in the US changes their username to “alice_new”
- User B in the EU changes the same username to “alice_updated”
- Both leaders accept the write locally
- When they sync… which one wins?
Conflict Resolution Strategies
Section titled “Conflict Resolution Strategies”| Strategy | How It Works | Best For |
|---|---|---|
| Last-Write-Wins (LWW) | Most recent timestamp wins; older write is discarded | Simple data, acceptable to lose updates |
| First-Write-Wins | First timestamp wins; reject later writes | Immutable records |
| Merge | Combine both values using domain logic | Shopping carts, sets, counters |
| Custom/User Resolution | Store both, let app or user decide | Documents, complex data |
Strategy 3: Leaderless Replication
Section titled “Strategy 3: Leaderless Replication”No designated leader — any node can accept reads and writes. Used by systems like Cassandra, DynamoDB, and Riak.
Quorum: The Magic Formula
Section titled “Quorum: The Magic Formula”The key concept is quorum — a voting system for consistency:
- N = Total number of replicas
- W = Number of nodes that must confirm a write
- R = Number of nodes that must respond to a read
The Rule: If W + R > N, you’re guaranteed to read at least one node with the latest data.
Common Quorum Configurations
Section titled “Common Quorum Configurations”| Config | W | R | Trade-off |
|---|---|---|---|
| Balanced | 2 | 2 | Good consistency + availability |
| Write-heavy | 1 | 3 | Fast writes, slower reads |
| Read-heavy | 3 | 1 | Slow writes, fast reads |
Handling Replication Lag
Section titled “Handling Replication Lag”With async replication, followers may be behind. This creates read consistency challenges:
Read Consistency Levels
Section titled “Read Consistency Levels”| Level | What It Guarantees | How To Achieve |
|---|---|---|
| Eventual | Data will sync “eventually” | Read from any replica |
| Read-Your-Writes | See your own writes immediately | Read from leader after write, or track write timestamps |
| Monotonic Reads | Never see older data than before | Stick to same replica, or track read positions |
| Strong/Linearizable | Always see latest | Read from leader only |
Comparing All Three Strategies
Section titled “Comparing All Three Strategies”| Aspect | Leader-Follower | Multi-Leader | Leaderless |
|---|---|---|---|
| Write Scalability | Limited (1 leader) | Good (multiple leaders) | Best (any node) |
| Read Scalability | Good (add followers) | Good | Good |
| Consistency | Easier to achieve | Conflict resolution needed | Quorum-based |
| Latency (geo) | High (single leader) | Low (local leaders) | Low |
| Complexity | Simplest | Complex (conflicts) | Complex (quorums) |
| Examples | MySQL, PostgreSQL | CouchDB, Google Docs | Cassandra, DynamoDB |
LLD ↔ HLD Connection
Section titled “LLD ↔ HLD Connection”| Replication Concept | LLD Implementation |
|---|---|
| Sync Replication | Write-through cache pattern, blocking writes |
| Async Replication | Write-behind pattern, background jobs, message queues |
| Conflict Resolution | Strategy pattern for merge logic, domain-specific resolvers |
| Read Consistency | Strategy pattern for read routing, client-side timestamp tracking |
| Failover | Observer pattern for leader changes, Circuit breaker for failed replicas |
Key Takeaways
Section titled “Key Takeaways”What’s Next?
Section titled “What’s Next?”Now that you understand how data is replicated, let’s learn about building systems that survive failures:
Next up: Fault Tolerance & Redundancy - Learn to design systems that work even when things break.