Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Scalability Fundamentals

Building systems that grow with your success

Scalability is a system’s ability to handle increased load by adding resources. A scalable system can grow to accommodate more users, more data, or more transactions without degrading performance.

Diagram

Add more power to existing machines - bigger CPU, more RAM, faster disks.

Diagram

Pros:

  • Simple - no code changes needed
  • No distributed system complexity
  • Strong consistency is easy

Cons:

  • Hardware limits (can’t add infinite CPU)
  • Expensive at high end
  • Single point of failure
  • Downtime during upgrades

Add more machines - distribute the load across multiple servers.

Diagram

Pros:

  • No hardware limits (add infinite machines)
  • Cost-effective (use commodity hardware)
  • Built-in redundancy
  • Gradual scaling

Cons:

  • Distributed system complexity
  • Data consistency challenges
  • Code must be designed for it
  • Network overhead

AspectVertical ScalingHorizontal Scaling
ApproachBigger machineMore machines
LimitHardware ceilingTheoretically unlimited
CostExpensive at scaleCost-effective
ComplexitySimpleComplex
DowntimeRequired for upgradesZero-downtime possible
FailureSingle point of failureRedundancy built-in
Code changesUsually noneMay require redesign

This is where LLD meets HLD. Your class design determines whether your system can scale horizontally.

Diagram

The solution is to externalize all state to a shared store (Redis, database, etc.):

  • No instance variables holding user/session data
  • All state lives externally in Redis, database, or similar
  • Any server can handle any request because they all access the same shared state
Diagram

Systems can be scaled along different dimensions:

Handle more requests per second

Diagram

Handle more data

Diagram

Serve users globally with low latency

Diagram

Key Principle: All state lives externally (database, cache, message queue). The service itself stores nothing between requests.

Diagram

This allows you to run any number of service instances and route requests to any of them.

Key Principle: Operations that can be safely retried without side effects. Critical for distributed systems where network failures cause retries.

Diagram

Implementation: Store results keyed by a unique idempotency key. Before processing, check if the key exists and return the cached result.

Key Principle: Move slow, non-critical operations out of the request path using message queues.

Diagram

Result: Users get fast responses. Slow operations (email, analytics, notifications) happen in the background without blocking.


Company: Netflix

Scenario: Netflix streams content to 200+ million subscribers globally. The system must handle massive traffic spikes during peak hours and popular show releases.

Implementation: Uses horizontal scaling with stateless microservices:

Diagram

Why Horizontal Scaling?

  • Traffic Spikes: Handle 10x traffic during popular releases
  • Global Distribution: Serve users from nearest data center
  • Cost-Effective: Use commodity hardware, scale down during off-peak
  • Result: Handles billions of requests daily with 99.99% availability

Real-World Impact:

  • Scale: 200+ million subscribers, 15% of global internet bandwidth
  • Instances: Thousands of microservice instances, auto-scaling
  • Availability: 99.99% uptime despite massive scale

Company: Amazon

Scenario: Amazon handles millions of orders daily with massive traffic spikes during events like Prime Day and Black Friday.

Implementation: Uses horizontal scaling with stateless services and externalized state:

Diagram

Why Stateless Design?

  • Auto-Scaling: Add instances instantly during traffic spikes
  • Fault Tolerance: Failed instances don’t lose data
  • Load Distribution: Any instance can handle any request
  • Result: Handles 10x traffic spikes without downtime

Real-World Impact:

  • Scale: Millions of orders per day, billions of page views
  • Spike Handling: 10x traffic increase during Prime Day
  • Availability: 99.99% uptime during peak events

Example 3: Google Search Horizontal Scaling

Section titled “Example 3: Google Search Horizontal Scaling”

Company: Google

Scenario: Google Search handles billions of queries daily. The system must respond in milliseconds while handling massive concurrent load.

Implementation: Uses massive horizontal scaling with stateless search clusters:

Diagram

Why Massive Horizontal Scaling?

  • Query Volume: Billions of queries require massive capacity
  • Low Latency: Geographic distribution reduces latency
  • Redundancy: Multiple clusters ensure availability
  • Result: Sub-100ms response times at massive scale

Real-World Impact:

  • Scale: 8.5+ billion queries per day
  • Performance: < 100ms average response time
  • Availability: 99.99% uptime globally

Company: Uber

Scenario: Uber matches riders with drivers in real-time across hundreds of cities. The system must handle millions of concurrent requests.

Implementation: Uses horizontal scaling with stateless matching services:

Diagram

Why Stateless Design?

  • Real-Time Matching: Low latency required for good UX
  • Geographic Distribution: Serve users from nearest region
  • Fault Tolerance: Instance failures don’t affect matching
  • Result: Sub-second matching times at massive scale

Real-World Impact:

  • Scale: Millions of rides per day globally
  • Latency: < 1 second matching time
  • Availability: 99.9% uptime despite massive concurrent load

When designing classes, ask yourself:

QuestionWhy It Matters
Does this class store state in instance variables?Prevents horizontal scaling
Can multiple instances run simultaneously?Required for scaling out
Are operations idempotent?Enables safe retries
What happens if this operation is slow?May need async processing
Does this depend on local resources (files, memory)?Won’t work across servers
How does this handle concurrent requests?Thread safety concerns


Understanding scalability is just the beginning. Next, we’ll dive into measuring system performance:

Next up: Latency and Throughput - Learn the key metrics that define system performance.