Why System Design Matters
The Journey from Class to System
Section titled “The Journey from Class to System”You’ve written a beautiful class. It’s well-designed, follows SOLID principles, and has great test coverage. But software doesn’t run in isolation—it runs on servers, handles thousands of users, and must work 24/7.
What is System Design?
Section titled “What is System Design?”System design is the process of defining the architecture, components, and data flow of a system to meet specific requirements. It’s about making decisions that affect:
- How your code runs - On one server or thousands?
- How data flows - Synchronous or asynchronous?
- How failures are handled - What happens when things break?
- How the system scales - Can it handle 10x more users?
The Two Levels of Design
Section titled “The Two Levels of Design”| Aspect | High-Level Design (HLD) | Low-Level Design (LLD) |
|---|---|---|
| Focus | System architecture | Class structure |
| Scope | Multiple services | Single service/module |
| Artifacts | Architecture diagrams | Class diagrams |
| Decisions | Which database? How many servers? | Which pattern? What interface? |
| Scale | Millions of users | Thousands of objects |
Why LLD Engineers Need System Design
Section titled “Why LLD Engineers Need System Design”1. Your Code Doesn’t Run in Isolation
Section titled “1. Your Code Doesn’t Run in Isolation”Every class you write will eventually run in a system with:
1class OrderService:2 """Looks simple, but consider the system context..."""3
4 def __init__(self, db: Database, payment: PaymentGateway, inventory: InventoryService):5 self.db = db # Which database? Replicated? Sharded?6 self.payment = payment # External API - what if it's slow?7 self.inventory = inventory # Another service - what if it's down?8
9 def place_order(self, order: Order) -> OrderResult:10 # What if this takes 30 seconds?11 # What if 1000 users call this simultaneously?12 # What if the database is in another data center?13
14 self.inventory.reserve(order.items) # Network call #115 payment_result = self.payment.charge(order.total) # Network call #216 self.db.save(order) # Network call #317
18 return OrderResult(success=True, order_id=order.id)1public class OrderService {2 // Looks simple, but consider the system context...3
4 private final Database db; // Which database? Replicated? Sharded?5 private final PaymentGateway payment; // External API - what if it's slow?6 private final InventoryService inventory; // Another service - what if it's down?7
8 public OrderService(Database db, PaymentGateway payment, InventoryService inventory) {9 this.db = db;10 this.payment = payment;11 this.inventory = inventory;12 }13
14 public OrderResult placeOrder(Order order) {15 // What if this takes 30 seconds?16 // What if 1000 users call this simultaneously?17 // What if the database is in another data center?18
19 inventory.reserve(order.getItems()); // Network call #120 PaymentResult paymentResult = payment.charge(order.getTotal()); // Network call #221 db.save(order); // Network call #322
23 return new OrderResult(true, order.getId());24 }25}2. Design Decisions Have System Implications
Section titled “2. Design Decisions Have System Implications”Every LLD decision affects the system:
| LLD Decision | System Implication |
|---|---|
| Using Singleton pattern | Won’t work across multiple servers |
| Storing state in instance variables | Can’t scale horizontally |
| Synchronous method calls | Creates coupling, blocks resources |
| In-memory caching | Each server has different cache |
| Auto-increment IDs | Conflicts in distributed databases |
3. Interviews Test Both Levels
Section titled “3. Interviews Test Both Levels”In senior engineering interviews, expect questions like:
The Five Pillars of System Design
Section titled “The Five Pillars of System Design”Every system design discussion involves these key concerns:
1. Scalability
Section titled “1. Scalability”Can the system handle growth?
LLD Impact: Design classes that can work in a distributed environment. Avoid global state, use dependency injection, make components stateless where possible.
2. Reliability
Section titled “2. Reliability”Does the system work correctly, even when things fail?
- Hardware fails (servers crash, disks die)
- Software has bugs
- Networks are unreliable
- Users make mistakes
LLD Impact: Implement proper error handling, use retry patterns, design for idempotency.
3. Availability
Section titled “3. Availability”Is the system accessible when users need it?
- 99.9% uptime = 8.76 hours downtime/year
- 99.99% uptime = 52.6 minutes downtime/year
- 99.999% uptime = 5.26 minutes downtime/year
LLD Impact: Design classes with fallback behaviors, implement circuit breakers, handle graceful degradation.
4. Maintainability
Section titled “4. Maintainability”Can the system be easily modified and operated?
- New features can be added
- Bugs can be fixed quickly
- Operations are simple
- System is observable
LLD Impact: Follow SOLID principles, write clean code, use design patterns appropriately.
5. Performance
Section titled “5. Performance”Does the system respond quickly and efficiently?
- Low latency (fast responses)
- High throughput (many requests)
- Efficient resource usage
LLD Impact: Choose appropriate data structures, optimize algorithms, minimize unnecessary operations.
Real-World Example: A Simple Counter
Section titled “Real-World Example: A Simple Counter”Let’s see how system thinking changes a simple class design:
Version 1: The Naive Approach
Section titled “Version 1: The Naive Approach”1class PageViewCounter:2 """Simple counter - works perfectly on one server"""3
4 def __init__(self):5 self.counts = {} # page_id -> count6
7 def increment(self, page_id: str) -> int:8 if page_id not in self.counts:9 self.counts[page_id] = 010 self.counts[page_id] += 111 return self.counts[page_id]12
13 def get_count(self, page_id: str) -> int:14 return self.counts.get(page_id, 0)15
16# Usage17counter = PageViewCounter()18counter.increment("homepage") # 119counter.increment("homepage") # 21import java.util.HashMap;2import java.util.Map;3
4public class PageViewCounter {5 // Simple counter - works perfectly on one server6 private Map<String, Integer> counts = new HashMap<>();7
8 public synchronized int increment(String pageId) {9 int count = counts.getOrDefault(pageId, 0) + 1;10 counts.put(pageId, count);11 return count;12 }13
14 public int getCount(String pageId) {15 return counts.getOrDefault(pageId, 0);16 }17}18
19// Usage20public class Main {21 public static void main(String[] args) {22 PageViewCounter counter = new PageViewCounter();23 counter.increment("homepage"); // 124 counter.increment("homepage"); // 225 }26}Problems with this design:
- ❌ Data lost if server restarts
- ❌ Different counts on each server
- ❌ No persistence
- ❌ Memory grows unbounded
Version 2: System-Aware Design
Section titled “Version 2: System-Aware Design”The key insight is to externalize state to a shared store that all servers can access. This requires:
- Abstraction - Define an interface for storage (Dependency Inversion Principle)
- Shared State - Use Redis, a database, or similar shared storage
- Atomic Operations - Use Redis’s
INCRcommand which is atomic
1class PageViewCounter:2 """Counter that works in distributed systems"""3
4 def __init__(self, redis_client):5 self.redis = redis_client # External shared state6
7 def increment(self, page_id: str) -> int:8 return self.redis.incr(f"pageview:{page_id}") # Atomic operation9
10 def get_count(self, page_id: str) -> int:11 return int(self.redis.get(f"pageview:{page_id}") or 0)12
13# Now works across all servers!14counter = PageViewCounter(redis.Redis(host='redis-cluster'))15counter.increment("homepage")1public class PageViewCounter {2 private final Jedis redis; // External shared state3
4 public PageViewCounter(Jedis redis) { this.redis = redis; }5
6 public long increment(String pageId) {7 return redis.incr("pageview:" + pageId); // Atomic operation8 }9
10 public long getCount(String pageId) {11 String count = redis.get("pageview:" + pageId);12 return count != null ? Long.parseLong(count) : 0;13 }14}What changed and why:
| Change | System Design Reason |
|---|---|
Added CounterStorage interface | Decouples from specific storage (DIP) |
| Used Redis instead of in-memory | Shared state across servers |
| Dependency injection | Testable, flexible, swappable |
Atomic operations (INCR) | Handles concurrent requests |
Key Takeaways
Section titled “Key Takeaways”What’s Next?
Section titled “What’s Next?”Now that you understand why system design matters, let’s dive into the first fundamental concept:
Next up: Scalability Fundamentals - Learn how systems grow and the strategies to handle that growth.