Distribute Load
Load balancing distributes requests across servers for better performance, availability, and scalability.
Without load balancing, a single server handles all traffic. When that server becomes overloaded, it slows down and eventually crashes. There’s no redundancy—if the server fails, the entire system goes down. You can’t scale horizontally because all traffic goes to one server.
With load balancing, traffic is distributed across multiple servers. No single server gets overwhelmed. Performance improves because work is shared. Availability increases because if one server fails, others continue handling requests. Scaling becomes easy—just add more servers to the pool.
Load balancing is fundamental to distributed systems. It’s how you turn a single server into a scalable, highly available system.
Load balancing is the practice of distributing incoming requests across multiple servers to improve performance, availability, and scalability. Think of it like a restaurant host assigning tables to waiters—the host ensures work is distributed evenly so no single waiter gets overwhelmed.
The load balancer sits between clients and servers. When a request arrives, the load balancer selects a server from the pool and forwards the request. The selection is based on an algorithm—round robin, least connections, IP hash, or weighted distribution.
Amazon handles billions of requests per day. Without load balancing, this would be impossible. Amazon uses load balancers at multiple levels:
Application Load Balancer (ALB) distributes HTTP/HTTPS traffic across multiple availability zones and targets. When you browse Amazon.com, your request goes through an ALB that routes it to one of thousands of servers based on factors like server health, current load, and geographic proximity.
Network Load Balancer (NLB) handles TCP/UDP traffic for high-performance, low-latency applications. Amazon’s internal services use NLBs for fast, efficient routing.
Classic Load Balancer provides basic load balancing across multiple EC2 instances. While older, it’s still used for simpler use cases.
Amazon’s load balancers automatically detect unhealthy servers and stop routing traffic to them. They distribute traffic evenly, handle SSL termination, and provide health checks. This architecture allows Amazon to handle massive scale while maintaining high availability.
Different algorithms suit different scenarios. Understanding when to use each algorithm is crucial for designing scalable systems.
Round robin is the simplest algorithm. It distributes requests sequentially to each server in rotation. Request 1 goes to Server A, request 2 to Server B, request 3 to Server C, then back to Server A.
When to use: Round robin works well when all servers have similar capacity and requests are similar in nature. It’s simple, fair, and easy to implement.
Limitations: Round robin doesn’t account for server capacity or current load. If Server A is twice as powerful as Server B, round robin still sends equal traffic to both. If Server A is already handling a slow request, round robin still sends the next request to it.
Weighted round robin assigns more requests to servers with higher capacity. If Server A has weight 3 and Server B has weight 1, Server A gets three requests for every one request Server B gets.
When to use: Use weighted round robin when servers have different capacities. A server with 32 CPU cores should handle more traffic than a server with 8 CPU cores.
Real-world example: A content delivery network (CDN) uses weighted round robin to route more traffic to edge servers with higher bandwidth capacity. Servers in data centers with better connectivity get higher weights.
Least connections routes requests to the server with the fewest active connections. This is more accurate than round robin when connections have varying durations.
When to use: Least connections is ideal for long-lived connections like WebSocket connections, database connections, or file transfers. It ensures servers with fewer active connections get new requests.
Real-world example: A video streaming service uses least connections because video streams are long-lived. A server handling 100 active video streams shouldn’t get new requests—it should go to a server with fewer active streams.
IP hash uses a hash of the client’s IP address to determine which server handles the request. The same client always goes to the same server, providing session affinity.
When to use: Use IP hash when you need sticky sessions—when a client must connect to the same server for multiple requests. This is common with stateful applications that store session data on the server.
Limitations: IP hash can cause uneven distribution if clients are concentrated in certain IP ranges. It also doesn’t adapt well to server failures—if a server goes down, all its clients need to reconnect.
Real-world example: An e-commerce site uses IP hash to ensure a user’s shopping cart stays on the same server. Without sticky sessions, adding items to the cart might fail because the cart is stored on a different server.
Load balancers operate at different network layers, each with different capabilities and trade-offs.
Layer 4 load balancers operate at the transport layer (TCP/UDP). They route traffic based on IP addresses and port numbers. They don’t inspect the content of requests—they just forward packets.
Advantages: Layer 4 load balancers are fast because they don’t need to inspect request content. They have low latency and can handle high throughput. They’re simple to configure and operate.
Disadvantages: Layer 4 load balancers can’t do content-based routing. They can’t route based on URL paths, HTTP headers, or cookies. They can’t do SSL termination or request/response manipulation.
Use cases: Layer 4 load balancing is ideal for simple TCP/UDP applications, high-throughput scenarios, or when you need low latency. Database load balancing often uses Layer 4.
Layer 7 load balancers operate at the application layer (HTTP/HTTPS). They inspect the content of requests and can route based on URLs, HTTP headers, cookies, and other application-level information.
Advantages: Layer 7 load balancers can do intelligent routing. They can route /api/users to one server pool and /api/orders to another. They can do SSL termination, request/response manipulation, and content-based routing.
Disadvantages: Layer 7 load balancers are slower than Layer 4 because they must inspect request content. They have higher latency and lower throughput. They’re more complex to configure.
Use cases: Layer 7 load balancing is ideal for HTTP/HTTPS applications, microservices architectures, or when you need content-based routing. Most web applications use Layer 7 load balancers.
Real-world example: An API gateway uses Layer 7 load balancing to route requests to different microservices based on URL paths. /api/users goes to the user service, /api/orders goes to the order service, and /api/payments goes to the payment service.
Load balancers must check server health to avoid routing to unhealthy servers. Health checks are periodic requests to servers to verify they’re responding correctly.
Active health checks are initiated by the load balancer. The load balancer periodically sends health check requests (like GET /health) to each server. If a server doesn’t respond or returns an error, the load balancer removes it from the pool.
Passive health checks monitor responses to real requests. If a server returns too many errors, the load balancer removes it from the pool. This is less intrusive but slower to detect failures.
Combined approach uses both active and passive health checks for reliability. Active checks provide fast failure detection, while passive checks provide real-world validation.
Real-world example: AWS Elastic Load Balancer performs health checks every 30 seconds by default. If a server fails 2 consecutive health checks, it’s marked unhealthy and removed from the pool. When it passes health checks again, it’s automatically added back.
Load balancing algorithms are common interview topics. Here’s how to implement the most important ones:
Round robin maintains a current index and increments it for each request, wrapping around when it reaches the end of the server list.
Least connections tracks the number of active connections per server and selects the server with the fewest connections.
| Algorithm | Use Case | Pros | Cons |
|---|---|---|---|
| Round Robin | Equal capacity servers | Simple, fair | Doesn’t account for load |
| Weighted Round Robin | Different capacity servers | Accounts for capacity | Doesn’t account for current load |
| Least Connections | Long-lived connections | Accounts for current load | Requires connection tracking |
| IP Hash | Session affinity needed | Sticky sessions | Uneven distribution possible |
Recommendation: Use least connections for most cases. It’s accurate and handles varying connection durations well. Use round robin for simple cases with equal servers. Use weighted round robin when servers have different capacities. Use IP hash only when you need sticky sessions.
Distribute Load
Load balancing distributes requests across servers for better performance, availability, and scalability.
Round Robin: Simple
Round robin is simplest but doesn’t account for server capacity or current load.
Least Connections: Best
Least connections is most accurate for varying connection durations. Recommended for most cases.
IP Hash: Sticky Sessions
IP hash provides session affinity but can cause uneven distribution. Use when stateful apps need it.
Health Checks
Always implement health checks to avoid routing to unhealthy servers. Remove failed servers from pool.
Layer 4 vs Layer 7
Layer 4 is faster, Layer 7 is more intelligent. Choose based on routing needs.