Leader Election

Selecting a single coordinator in distributed systems

The Coordination Problem

In distributed systems, we often need a single coordinator to:

Make decisions: Which node handles a request?
Manage resources: Coordinate access to shared resources
Maintain consistency: Ensure all nodes agree on state
Handle failures: Detect and recover from node failures

The Challenge: How do multiple nodes agree on who the leader is? How do we handle leader failures? How do we prevent split-brain (multiple leaders)?

What is Leader Election?

Leader election is the process of selecting a single node as the coordinator (leader) in a distributed system. The leader handles coordination tasks, while other nodes (followers) follow the leader’s decisions.

Key Requirements

Safety: Only one leader at a time (no split-brain)
Liveness: Eventually a leader exists (even after failures)
Fault Tolerance: Handles node failures gracefully
Performance: Fast election, minimal overhead
Uniqueness: No conflicting leaders

Simple Analogy

Think of leader election like electing a class president:

Multiple candidates (nodes) can run
Election process determines winner (leader)
If president is absent (leader fails), new election held
Only one president at a time (safety)
Eventually someone is elected (liveness)

Bully Algorithm

The Bully algorithm elects a leader based on node ID. The node with the highest ID wins.

How It Works:

When a node detects leader failure, it initiates election
Node sends election message to all nodes with higher IDs
If no response from higher nodes, node becomes leader
If response received, wait for leader announcement
Leader announces itself to all nodes

Advantages:

Simple to understand and implement
Fast election (O(n) messages)
Deterministic (highest ID always wins)

Disadvantages:

Can have multiple elections if multiple nodes detect failure simultaneously
Requires all nodes to know all other nodes
Not fault-tolerant if highest ID node is unstable

Raft Consensus Algorithm

Raft is a consensus algorithm that provides leader election and log replication. It’s more complex than Bully but provides stronger guarantees.

Raft States

Nodes in Raft can be in one of three states:

Leader: Handles all client requests, replicates log to followers
Follower: Receives log entries from leader, votes in elections
Candidate: Campaigning to become leader

How Raft Election Works:

Follower doesn’t receive heartbeat from leader (timeout)
Follower becomes candidate and increments term
Candidate requests votes from all nodes
If candidate receives majority votes, becomes leader
Leader sends heartbeats to prevent new elections

Key Features:

Majority voting: Prevents split-brain
Terms: Each election has a term number (monotonically increasing)
Log replication: Leader replicates log entries to followers
Safety: Only one leader per term

Leader Election Implementation

ZooKeeper-Based Election

Apache ZooKeeper provides built-in support for leader election using ephemeral sequential nodes:

All nodes create ephemeral sequential nodes under /election
Node with smallest sequence number becomes leader
If leader fails, ephemeral node deleted, next node becomes leader
Nodes watch the node with sequence number one less than theirs

Advantages:

Handled by ZooKeeper (no custom implementation needed)
Automatic failure detection
No split-brain (ZooKeeper provides consistency)

Challenges and Solutions

Split-Brain

Problem: Network partition causes multiple leaders.

Solution: Require majority vote (quorum). Only partition with majority can elect leader.

Election Storms

Problem: Multiple nodes start election simultaneously.

Solution: Use random election timeout. Reduces probability of simultaneous elections.

Leader Failure Detection

Problem: How to detect leader failure quickly?

Solution: Use heartbeat mechanism. If no heartbeat received within timeout, assume leader failed.

Use Cases

Distributed Databases

Leader handles writes, replicates to followers. Ensures consistency and handles failures.

Coordination Services

Leader coordinates distributed operations, manages shared state.

Replicated Systems

Leader makes decisions, followers replicate state. Provides fault tolerance.

Trade-offs

Advantages:

Provides single coordinator
Handles failures gracefully
Prevents split-brain (with majority voting)
Enables coordination

Disadvantages:

Adds complexity
Leader can become bottleneck
Network partitions can cause issues
Requires majority for election

Key Takeaways

Single Coordinator

Leader election ensures only one node acts as coordinator. Provides mutual exclusion for coordination tasks.

Majority Voting

Require majority votes to prevent split-brain. Only partition with majority can elect leader.

Fault Tolerance

Leader failures trigger new elections. System eventually elects new leader. Handles node failures gracefully.

Heartbeat Mechanism

Leader sends heartbeats to prevent elections. Followers detect leader failure via timeout. Triggers new election.

Next Steps

Learn Distributed Locks - another coordination mechanism
Master Consistency Models - leader election ensures consistency
Understand Replication Strategies - leader-based replication

Request a feature or report an issue