Minimal Rehashing
Only k/n keys need remapping when k nodes are added. Much better than traditional hashing where all keys remap.
In distributed systems, we need to map keys (like cache keys, database shard keys, or request IDs) to nodes (servers, cache instances, or database shards). Traditional hashing uses a simple modulo operation:
Traditional Approach:
node = hash(key) % num_nodesThe Problem:
Real-World Impact: Imagine a distributed cache with 1 million keys. Adding one server means remapping all 1 million keys, causing cache misses, database load spikes, and degraded performance.
Consistent hashing is a distributed hashing scheme that minimizes the number of keys that need to be remapped when nodes are added or removed. Instead of using modulo, it uses a hash ring - a circular space where both keys and nodes are mapped.
Think of consistent hashing like a clock face:
Key Properties:
When a node is added:
When a node is removed:
Virtual nodes (vnodes) are multiple hash positions for a single physical node. This improves load distribution and handles nodes with different capacities.
Why Virtual Nodes Matter:
Without virtual nodes, if nodes are hashed to positions that cluster together, some nodes get many keys while others get few. Virtual nodes spread each physical node across multiple positions on the ring, ensuring more even distribution.
Example:
Consistent hashing is widely used in distributed caches like Memcached and Redis Cluster. When a cache node is added or removed, only a fraction of keys need to be remapped, minimizing cache misses.
Example: A cache cluster with 10 nodes. Adding one node remaps only ~10% of keys instead of 100%.
Load balancers use consistent hashing to route requests to backend servers. This ensures:
Distributed databases use consistent hashing to determine which shard stores a record. This enables:
Content delivery networks use consistent hashing to route requests to edge servers. Ensures content is served from the same server for cache efficiency.
Advantages:
Disadvantages:
Minimal Rehashing
Only k/n keys need remapping when k nodes are added. Much better than traditional hashing where all keys remap.
Hash Ring
Circular space where keys and nodes are mapped. Keys assigned to first node clockwise from their position.
Virtual Nodes
Multiple hash positions per physical node improve load distribution and handle different node capacities.
Efficient Lookup
Use sorted data structure (TreeMap, sorted array) with binary search for O(log n) key lookup.