Latency and Throughput
The Two Pillars of Performance
Section titled “The Two Pillars of Performance”When measuring system performance, two metrics matter most:
Understanding Latency
Section titled “Understanding Latency”Latency is the time between sending a request and receiving a response.
Latency Breakdown
Section titled “Latency Breakdown”Every request goes through multiple stages:
Types of Latency
Section titled “Types of Latency”| Type | Description | Typical Values |
|---|---|---|
| Network Latency | Time for data to travel over network | 1-100ms |
| Processing Latency | Time for server to process request | 1-50ms |
| Database Latency | Time for database queries | 1-10ms |
| Queue Latency | Time spent waiting in queues | 0-1000ms+ |
Percentile Latencies: P50, P95, P99
Section titled “Percentile Latencies: P50, P95, P99”Average latency can be misleading. Percentiles give a better picture:
Why P99 Matters
Section titled “Why P99 Matters”Real-World Example
Section titled “Real-World Example”Understanding Throughput
Section titled “Understanding Throughput”Throughput is the amount of work done per unit of time.
Common Throughput Metrics
Section titled “Common Throughput Metrics”| Metric | Description | Example |
|---|---|---|
| RPS | Requests per second | 10,000 RPS |
| TPS | Transactions per second | 5,000 TPS |
| QPS | Queries per second | 50,000 QPS |
| Bandwidth | Data transferred per second | 1 Gbps |
Throughput Calculation
Section titled “Throughput Calculation”Throughput is calculated as:
1Throughput = Total Requests / Time PeriodExample: If your server handles 10,000 requests in 10 seconds, your throughput is 1,000 RPS.
Key considerations:
- Measure over time - instantaneous measurements fluctuate
- Track success rate - failed requests count against effective throughput
- Monitor under load - throughput often drops when the system is stressed
Latency vs Throughput: The Trade-off
Section titled “Latency vs Throughput: The Trade-off”They’re related but independent:
Little’s Law
Section titled “Little’s Law”A fundamental relationship:
1Average Concurrent Requests = Throughput × Average LatencyExample:
- Throughput: 1,000 RPS
- Average Latency: 100ms = 0.1 seconds
- Concurrent Requests: 1,000 × 0.1 = 100 requests in flight
How Method Design Impacts Performance
Section titled “How Method Design Impacts Performance”Your code decisions directly affect latency and throughput:
Bad: Sequential Operations
Section titled “Bad: Sequential Operations”1class OrderService:2 """❌ Sequential calls - high latency"""3
4 def get_order_details(self, order_id: str) -> OrderDetails:5 # Each call waits for the previous one6 order = self.db.get_order(order_id) # 10ms7 user = self.user_service.get(order.user_id) # 20ms8 items = self.inventory.get_items(order.items) # 15ms9 shipping = self.shipping.get_status(order_id) # 25ms10
11 # Total: 10 + 20 + 15 + 25 = 70ms12 return OrderDetails(order, user, items, shipping)1public class OrderService {2 // ❌ Sequential calls - high latency3
4 public OrderDetails getOrderDetails(String orderId) {5 // Each call waits for the previous one6 Order order = db.getOrder(orderId); // 10ms7 User user = userService.get(order.getUserId()); // 20ms8 List<Item> items = inventory.getItems(order.getItems()); // 15ms9 ShippingStatus shipping = shippingService.getStatus(orderId); // 25ms10
11 // Total: 10 + 20 + 15 + 25 = 70ms12 return new OrderDetails(order, user, items, shipping);13 }14}Good: Parallel Operations
Section titled “Good: Parallel Operations”1import asyncio2from concurrent.futures import ThreadPoolExecutor3
4class OrderService:5 """✅ Parallel calls - lower latency"""6
7 async def get_order_details(self, order_id: str) -> OrderDetails:8 # First, get the order (we need it for user_id)9 order = await self.db.get_order(order_id) # 10ms10
11 # Then fetch everything else in parallel12 user, items, shipping = await asyncio.gather(13 self.user_service.get(order.user_id), # 20ms14 self.inventory.get_items(order.items), # 15ms } All run15 self.shipping.get_status(order_id) # 25ms } in parallel16 )17
18 # Total: 10 + max(20, 15, 25) = 10 + 25 = 35ms19 # Saved 35ms (50% reduction!)20 return OrderDetails(order, user, items, shipping)1import java.util.concurrent.CompletableFuture;2
3public class OrderService {4 // ✅ Parallel calls - lower latency5
6 public CompletableFuture<OrderDetails> getOrderDetails(String orderId) {7 // First, get the order (we need it for user_id)8 return db.getOrderAsync(orderId) // 10ms9 .thenCompose(order -> {10 // Then fetch everything else in parallel11 CompletableFuture<User> userFuture =12 userService.getAsync(order.getUserId()); // 20ms13 CompletableFuture<List<Item>> itemsFuture =14 inventory.getItemsAsync(order.getItems()); // 15ms } All run15 CompletableFuture<ShippingStatus> shippingFuture =16 shippingService.getStatusAsync(orderId); // 25ms } in parallel17
18 return CompletableFuture.allOf(userFuture, itemsFuture, shippingFuture)19 .thenApply(v -> new OrderDetails(20 order,21 userFuture.join(),22 itemsFuture.join(),23 shippingFuture.join()24 ));25 });26
27 // Total: 10 + max(20, 15, 25) = 10 + 25 = 35ms28 // Saved 35ms (50% reduction!)29 }30}Caching: The Latency Killer
Section titled “Caching: The Latency Killer”Caching is the most effective way to reduce latency. The idea is simple: store frequently accessed data closer to where it’s needed.
Multi-Level Cache Architecture
Section titled “Multi-Level Cache Architecture”Latency Math
Section titled “Latency Math”With this caching strategy:
| Level | Latency | Hit Rate |
|---|---|---|
| L1 (Local) | 0.01ms | 90% |
| L2 (Redis) | 2ms | 9% |
| Database | 30ms | 1% |
Average latency = 0.9(0.01) + 0.09(2) + 0.01(30) = 0.49ms
That’s a 60x improvement over hitting the database every time!
Measuring Performance
Section titled “Measuring Performance”Key Metrics to Track
Section titled “Key Metrics to Track”| Metric | What It Tells You | Action Threshold |
|---|---|---|
| P50 | Typical user experience | Baseline for “normal” |
| P95 | Most users’ worst case | Watch for drift |
| P99 | Outlier experience | Investigate if > 3x P50 |
| Error Rate | System health | Alert if > 1% |
Tools for Measurement
Section titled “Tools for Measurement”In Production:
- APM Tools: Datadog, New Relic, Dynatrace
- Metrics: Prometheus + Grafana
- Distributed Tracing: Jaeger, Zipkin
In Development:
- Profilers: cProfile (Python), JProfiler (Java)
- Benchmarking: pytest-benchmark, JMH
Key Takeaways
Section titled “Key Takeaways”What’s Next?
Section titled “What’s Next?”Now that you understand performance metrics, let’s learn how to identify and fix performance issues:
Next up: Understanding Bottlenecks - Learn to find and eliminate performance bottlenecks.