Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Leader Election

Selecting a single coordinator in distributed systems

In distributed systems, we often need a single coordinator to:

  • Make decisions: Which node handles a request?
  • Manage resources: Coordinate access to shared resources
  • Maintain consistency: Ensure all nodes agree on state
  • Handle failures: Detect and recover from node failures

The Challenge: How do multiple nodes agree on who the leader is? How do we handle leader failures? How do we prevent split-brain (multiple leaders)?

Diagram

Leader election is the process of selecting a single node as the coordinator (leader) in a distributed system. The leader handles coordination tasks, while other nodes (followers) follow the leader’s decisions.

  1. Safety: Only one leader at a time (no split-brain)
  2. Liveness: Eventually a leader exists (even after failures)
  3. Fault Tolerance: Handles node failures gracefully
  4. Performance: Fast election, minimal overhead
  5. Uniqueness: No conflicting leaders

Think of leader election like electing a class president:

  • Multiple candidates (nodes) can run
  • Election process determines winner (leader)
  • If president is absent (leader fails), new election held
  • Only one president at a time (safety)
  • Eventually someone is elected (liveness)

The Bully algorithm elects a leader based on node ID. The node with the highest ID wins.

Diagram

How It Works:

  1. When a node detects leader failure, it initiates election
  2. Node sends election message to all nodes with higher IDs
  3. If no response from higher nodes, node becomes leader
  4. If response received, wait for leader announcement
  5. Leader announces itself to all nodes

Advantages:

  • Simple to understand and implement
  • Fast election (O(n) messages)
  • Deterministic (highest ID always wins)

Disadvantages:

  • Can have multiple elections if multiple nodes detect failure simultaneously
  • Requires all nodes to know all other nodes
  • Not fault-tolerant if highest ID node is unstable

Raft is a consensus algorithm that provides leader election and log replication. It’s more complex than Bully but provides stronger guarantees.

Nodes in Raft can be in one of three states:

  • Leader: Handles all client requests, replicates log to followers
  • Follower: Receives log entries from leader, votes in elections
  • Candidate: Campaigning to become leader
Diagram

How Raft Election Works:

  1. Follower doesn’t receive heartbeat from leader (timeout)
  2. Follower becomes candidate and increments term
  3. Candidate requests votes from all nodes
  4. If candidate receives majority votes, becomes leader
  5. Leader sends heartbeats to prevent new elections

Key Features:

  • Majority voting: Prevents split-brain
  • Terms: Each election has a term number (monotonically increasing)
  • Log replication: Leader replicates log entries to followers
  • Safety: Only one leader per term

Apache ZooKeeper provides built-in support for leader election using ephemeral sequential nodes:

  1. All nodes create ephemeral sequential nodes under /election
  2. Node with smallest sequence number becomes leader
  3. If leader fails, ephemeral node deleted, next node becomes leader
  4. Nodes watch the node with sequence number one less than theirs

Advantages:

  • Handled by ZooKeeper (no custom implementation needed)
  • Automatic failure detection
  • No split-brain (ZooKeeper provides consistency)

Problem: Network partition causes multiple leaders.

Solution: Require majority vote (quorum). Only partition with majority can elect leader.

Problem: Multiple nodes start election simultaneously.

Solution: Use random election timeout. Reduces probability of simultaneous elections.

Problem: How to detect leader failure quickly?

Solution: Use heartbeat mechanism. If no heartbeat received within timeout, assume leader failed.

Leader handles writes, replicates to followers. Ensures consistency and handles failures.

Leader coordinates distributed operations, manages shared state.

Leader makes decisions, followers replicate state. Provides fault tolerance.

Advantages:

  • Provides single coordinator
  • Handles failures gracefully
  • Prevents split-brain (with majority voting)
  • Enables coordination

Disadvantages:

  • Adds complexity
  • Leader can become bottleneck
  • Network partitions can cause issues
  • Requires majority for election

Single Coordinator

Leader election ensures only one node acts as coordinator. Provides mutual exclusion for coordination tasks.

Majority Voting

Require majority votes to prevent split-brain. Only partition with majority can elect leader.

Fault Tolerance

Leader failures trigger new elections. System eventually elects new leader. Handles node failures gracefully.

Heartbeat Mechanism

Leader sends heartbeats to prevent elections. Followers detect leader failure via timeout. Triggers new election.