Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Consistent Hashing

Distributed data placement that survives node changes

In distributed systems, we need to map keys (like cache keys, database shard keys, or request IDs) to nodes (servers, cache instances, or database shards). Traditional hashing uses a simple modulo operation:

Traditional Approach:

node = hash(key) % num_nodes
Diagram

The Problem:

  • Adding one node requires remapping all keys
  • Removing one node loses all keys mapped to that node
  • This causes massive data migration and cache invalidation

Real-World Impact: Imagine a distributed cache with 1 million keys. Adding one server means remapping all 1 million keys, causing cache misses, database load spikes, and degraded performance.

Consistent hashing is a distributed hashing scheme that minimizes the number of keys that need to be remapped when nodes are added or removed. Instead of using modulo, it uses a hash ring - a circular space where both keys and nodes are mapped.

Think of consistent hashing like a clock face:

  • The ring represents values from 0 to 2^32-1 (or 2^64-1)
  • Both keys and nodes are hashed to positions on this ring
  • A key is assigned to the first node encountered when moving clockwise from the key’s position
Diagram

Key Properties:

  • Minimal rehashing: Only keys between the old and new node positions need remapping
  • Load balancing: Keys are distributed evenly across nodes (with virtual nodes)
  • Fault tolerance: Node removal only affects keys mapped to that node
  • Horizontal scaling: Adding nodes is efficient
  1. Create hash ring: Define a circular space (typically 0 to 2^32-1)
  2. Hash nodes: Map each node to one or more positions on the ring
  3. Hash keys: Map each key to a position on the ring
  4. Assign keys: Each key belongs to the first node encountered clockwise from its position

When a node is added:

  • Only keys between the previous node and the new node need remapping
  • On average, only k/n keys need remapping (where k = number of keys, n = number of nodes)

When a node is removed:

  • Keys mapped to that node are reassigned to the next node clockwise
  • Other keys remain unchanged

Virtual nodes (vnodes) are multiple hash positions for a single physical node. This improves load distribution and handles nodes with different capacities.

Diagram

Why Virtual Nodes Matter:

Without virtual nodes, if nodes are hashed to positions that cluster together, some nodes get many keys while others get few. Virtual nodes spread each physical node across multiple positions on the ring, ensuring more even distribution.

Example:

  • Physical node “Server-1” might have virtual nodes at positions: 1000, 50000, 100000, 200000
  • Physical node “Server-2” might have virtual nodes at positions: 25000, 75000, 150000, 250000
  • This spreads the load more evenly

Consistent hashing is widely used in distributed caches like Memcached and Redis Cluster. When a cache node is added or removed, only a fraction of keys need to be remapped, minimizing cache misses.

Example: A cache cluster with 10 nodes. Adding one node remaps only ~10% of keys instead of 100%.

Load balancers use consistent hashing to route requests to backend servers. This ensures:

  • Session affinity: Same user always routed to same server (if using user ID as key)
  • Minimal disruption: Adding/removing servers doesn’t affect most requests

Distributed databases use consistent hashing to determine which shard stores a record. This enables:

  • Horizontal scaling: Add shards without massive data migration
  • Fault tolerance: Remove failed shards with minimal impact

Content delivery networks use consistent hashing to route requests to edge servers. Ensures content is served from the same server for cache efficiency.

Advantages:

  • Minimal rehashing when nodes change
  • Good load distribution (with virtual nodes)
  • Handles node failures gracefully
  • Scales horizontally

Disadvantages:

  • More complex than simple hashing
  • Requires sorted data structure for efficient lookup
  • Virtual nodes increase memory usage
  • Can have uneven distribution without virtual nodes

Minimal Rehashing

Only k/n keys need remapping when k nodes are added. Much better than traditional hashing where all keys remap.

Hash Ring

Circular space where keys and nodes are mapped. Keys assigned to first node clockwise from their position.

Virtual Nodes

Multiple hash positions per physical node improve load distribution and handle different node capacities.

Efficient Lookup

Use sorted data structure (TreeMap, sorted array) with binary search for O(log n) key lookup.