Why System Design Matters

From single class to global scale

The Journey from Class to System

You’ve written a beautiful class. It’s well-designed, follows SOLID principles, and has great test coverage. But software doesn’t run in isolation—it runs on servers, handles thousands of users, and must work 24/7.

What is System Design?

System design is the process of defining the architecture, components, and data flow of a system to meet specific requirements. It’s about making decisions that affect:

How your code runs - On one server or thousands?
How data flows - Synchronous or asynchronous?
How failures are handled - What happens when things break?
How the system scales - Can it handle 10x more users?

The Two Levels of Design

Aspect	High-Level Design (HLD)	Low-Level Design (LLD)
Focus	System architecture	Class structure
Scope	Multiple services	Single service/module
Artifacts	Architecture diagrams	Class diagrams
Decisions	Which database? How many servers?	Which pattern? What interface?
Scale	Millions of users	Thousands of objects

Why LLD Engineers Need System Design

1. Your Code Doesn’t Run in Isolation

Every class you write will eventually run in a system with:

2. Design Decisions Have System Implications

Every LLD decision affects the system:

LLD Decision	System Implication
Using Singleton pattern	Won’t work across multiple servers
Storing state in instance variables	Can’t scale horizontally
Synchronous method calls	Creates coupling, blocks resources
In-memory caching	Each server has different cache
Auto-increment IDs	Conflicts in distributed databases

3. Interviews Test Both Levels

In senior engineering interviews, expect questions like:

The Five Pillars of System Design

Every system design discussion involves these key concerns:

1. Scalability

Can the system handle growth?

LLD Impact: Design classes that can work in a distributed environment. Avoid global state, use dependency injection, make components stateless where possible.

2. Reliability

Does the system work correctly, even when things fail?

Hardware fails (servers crash, disks die)
Software has bugs
Networks are unreliable
Users make mistakes

LLD Impact: Implement proper error handling, use retry patterns, design for idempotency.

3. Availability

Is the system accessible when users need it?

99.9% uptime = 8.76 hours downtime/year
99.99% uptime = 52.6 minutes downtime/year
99.999% uptime = 5.26 minutes downtime/year

LLD Impact: Design classes with fallback behaviors, implement circuit breakers, handle graceful degradation.

4. Maintainability

Can the system be easily modified and operated?

New features can be added
Bugs can be fixed quickly
Operations are simple
System is observable

LLD Impact: Follow SOLID principles, write clean code, use design patterns appropriately.

5. Performance

Does the system respond quickly and efficiently?

Low latency (fast responses)
High throughput (many requests)
Efficient resource usage

LLD Impact: Choose appropriate data structures, optimize algorithms, minimize unnecessary operations.

Real-World Examples

Example 1: Twitter’s Tweet Counter Evolution

Company: Twitter (now X)

Scenario: Twitter needs to display view counts, like counts, and retweet counts for billions of tweets. Initially, they used in-memory counters, but this failed at scale.

Implementation: Evolved from naive to distributed design:

Why This Matters:

Scale: Billions of tweets, millions of interactions per second
Consistency: Users expect accurate counts
Performance: Counts must load instantly
Result: Redis-based distributed counters handle millions of increments per second

Real-World Impact:

Throughput: Millions of counter increments per second
Latency: Sub-millisecond counter updates
Consistency: All users see same counts globally

Example 2: Instagram’s Photo View Counter

Company: Instagram (Meta)

Scenario: Instagram displays view counts on photos and videos. With billions of photos and millions of views per second, they need a scalable counting system.

Implementation: Uses distributed counters with sharding:

Why Sharding?

Scale: Distributes load across multiple Redis instances
Capacity: Each shard handles subset of photos
Performance: Parallel processing increases throughput
Result: Handles billions of views with low latency

Real-World Impact:

Scale: Billions of photos, trillions of views
Performance: < 1ms counter increment latency
Availability: 99.99% uptime despite massive scale

Example 3: YouTube’s View Counter System

Company: Google (YouTube)

Scenario: YouTube tracks view counts for billions of videos. The system must handle massive spikes during viral videos while maintaining accuracy.

Implementation: Uses hybrid approach with batching:

Why Batching?

Efficiency: Reduces database writes by 100x
Performance: Handles traffic spikes gracefully
Accuracy: Eventually consistent, acceptable for views
Result: Handles viral video traffic spikes

Real-World Impact:

Scale: Billions of videos, trillions of views
Spike Handling: Handles 10x traffic spikes during viral events
Efficiency: 100x reduction in database writes through batching

Real-World Example: A Simple Counter

Let’s see how system thinking changes a simple class design:

Version 1: The Naive Approach

Problems with this design:

Data lost if server restarts
Different counts on each server
No persistence
Memory grows unbounded

Version 2: System-Aware Design

The key insight is to externalize state to a shared store that all servers can access. This requires:

Abstraction - Define an interface for storage (Dependency Inversion Principle)
Shared State - Use Redis, a database, or similar shared storage
Atomic Operations - Use Redis’s INCR command which is atomic

What changed and why:

Change	System Design Reason
Added `CounterStorage` interface	Decouples from specific storage (DIP)
Used Redis instead of in-memory	Shared state across servers
Dependency injection	Testable, flexible, swappable
Atomic operations (`INCR`)	Handles concurrent requests

Key Takeaways

What’s Next?

Now that you understand why system design matters, let’s dive into the first fundamental concept:

Next up: Scalability Fundamentals - Learn how systems grow and the strategies to handle that growth.

Request a feature or report an issue