Why System Design Matters
The Journey from Class to System
Section titled “The Journey from Class to System”You’ve written a beautiful class. It’s well-designed, follows SOLID principles, and has great test coverage. But software doesn’t run in isolation—it runs on servers, handles thousands of users, and must work 24/7.
What is System Design?
Section titled “What is System Design?”System design is the process of defining the architecture, components, and data flow of a system to meet specific requirements. It’s about making decisions that affect:
- How your code runs - On one server or thousands?
- How data flows - Synchronous or asynchronous?
- How failures are handled - What happens when things break?
- How the system scales - Can it handle 10x more users?
The Two Levels of Design
Section titled “The Two Levels of Design”| Aspect | High-Level Design (HLD) | Low-Level Design (LLD) |
|---|---|---|
| Focus | System architecture | Class structure |
| Scope | Multiple services | Single service/module |
| Artifacts | Architecture diagrams | Class diagrams |
| Decisions | Which database? How many servers? | Which pattern? What interface? |
| Scale | Millions of users | Thousands of objects |
Why LLD Engineers Need System Design
Section titled “Why LLD Engineers Need System Design”1. Your Code Doesn’t Run in Isolation
Section titled “1. Your Code Doesn’t Run in Isolation”Every class you write will eventually run in a system with:
2. Design Decisions Have System Implications
Section titled “2. Design Decisions Have System Implications”Every LLD decision affects the system:
| LLD Decision | System Implication |
|---|---|
| Using Singleton pattern | Won’t work across multiple servers |
| Storing state in instance variables | Can’t scale horizontally |
| Synchronous method calls | Creates coupling, blocks resources |
| In-memory caching | Each server has different cache |
| Auto-increment IDs | Conflicts in distributed databases |
3. Interviews Test Both Levels
Section titled “3. Interviews Test Both Levels”In senior engineering interviews, expect questions like:
The Five Pillars of System Design
Section titled “The Five Pillars of System Design”Every system design discussion involves these key concerns:
1. Scalability
Section titled “1. Scalability”Can the system handle growth?
LLD Impact: Design classes that can work in a distributed environment. Avoid global state, use dependency injection, make components stateless where possible.
2. Reliability
Section titled “2. Reliability”Does the system work correctly, even when things fail?
- Hardware fails (servers crash, disks die)
- Software has bugs
- Networks are unreliable
- Users make mistakes
LLD Impact: Implement proper error handling, use retry patterns, design for idempotency.
3. Availability
Section titled “3. Availability”Is the system accessible when users need it?
- 99.9% uptime = 8.76 hours downtime/year
- 99.99% uptime = 52.6 minutes downtime/year
- 99.999% uptime = 5.26 minutes downtime/year
LLD Impact: Design classes with fallback behaviors, implement circuit breakers, handle graceful degradation.
4. Maintainability
Section titled “4. Maintainability”Can the system be easily modified and operated?
- New features can be added
- Bugs can be fixed quickly
- Operations are simple
- System is observable
LLD Impact: Follow SOLID principles, write clean code, use design patterns appropriately.
5. Performance
Section titled “5. Performance”Does the system respond quickly and efficiently?
- Low latency (fast responses)
- High throughput (many requests)
- Efficient resource usage
LLD Impact: Choose appropriate data structures, optimize algorithms, minimize unnecessary operations.
Real-World Examples
Section titled “Real-World Examples”Example 1: Twitter’s Tweet Counter Evolution
Section titled “Example 1: Twitter’s Tweet Counter Evolution”Company: Twitter (now X)
Scenario: Twitter needs to display view counts, like counts, and retweet counts for billions of tweets. Initially, they used in-memory counters, but this failed at scale.
Implementation: Evolved from naive to distributed design:
Why This Matters:
- Scale: Billions of tweets, millions of interactions per second
- Consistency: Users expect accurate counts
- Performance: Counts must load instantly
- Result: Redis-based distributed counters handle millions of increments per second
Real-World Impact:
- Throughput: Millions of counter increments per second
- Latency: Sub-millisecond counter updates
- Consistency: All users see same counts globally
Example 2: Instagram’s Photo View Counter
Section titled “Example 2: Instagram’s Photo View Counter”Company: Instagram (Meta)
Scenario: Instagram displays view counts on photos and videos. With billions of photos and millions of views per second, they need a scalable counting system.
Implementation: Uses distributed counters with sharding:
Why Sharding?
- Scale: Distributes load across multiple Redis instances
- Capacity: Each shard handles subset of photos
- Performance: Parallel processing increases throughput
- Result: Handles billions of views with low latency
Real-World Impact:
- Scale: Billions of photos, trillions of views
- Performance: < 1ms counter increment latency
- Availability: 99.99% uptime despite massive scale
Example 3: YouTube’s View Counter System
Section titled “Example 3: YouTube’s View Counter System”Company: Google (YouTube)
Scenario: YouTube tracks view counts for billions of videos. The system must handle massive spikes during viral videos while maintaining accuracy.
Implementation: Uses hybrid approach with batching:
Why Batching?
- Efficiency: Reduces database writes by 100x
- Performance: Handles traffic spikes gracefully
- Accuracy: Eventually consistent, acceptable for views
- Result: Handles viral video traffic spikes
Real-World Impact:
- Scale: Billions of videos, trillions of views
- Spike Handling: Handles 10x traffic spikes during viral events
- Efficiency: 100x reduction in database writes through batching
Real-World Example: A Simple Counter
Section titled “Real-World Example: A Simple Counter”Let’s see how system thinking changes a simple class design:
Version 1: The Naive Approach
Section titled “Version 1: The Naive Approach”Problems with this design:
- Data lost if server restarts
- Different counts on each server
- No persistence
- Memory grows unbounded
Version 2: System-Aware Design
Section titled “Version 2: System-Aware Design”The key insight is to externalize state to a shared store that all servers can access. This requires:
- Abstraction - Define an interface for storage (Dependency Inversion Principle)
- Shared State - Use Redis, a database, or similar shared storage
- Atomic Operations - Use Redis’s
INCRcommand which is atomic
What changed and why:
| Change | System Design Reason |
|---|---|
Added CounterStorage interface | Decouples from specific storage (DIP) |
| Used Redis instead of in-memory | Shared state across servers |
| Dependency injection | Testable, flexible, swappable |
Atomic operations (INCR) | Handles concurrent requests |
Key Takeaways
Section titled “Key Takeaways”What’s Next?
Section titled “What’s Next?”Now that you understand why system design matters, let’s dive into the first fundamental concept:
Next up: Scalability Fundamentals - Learn how systems grow and the strategies to handle that growth.