Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Latency and Throughput

Measuring what matters in system performance

When measuring system performance, two metrics matter most:

Diagram

Latency is the time between sending a request and receiving a response.

Every request goes through multiple stages:

Diagram
TypeDescriptionTypical Values
Network LatencyTime for data to travel over network1-100ms
Processing LatencyTime for server to process request1-50ms
Database LatencyTime for database queries1-10ms
Queue LatencyTime spent waiting in queues0-1000ms+

Average latency can be misleading. Percentiles give a better picture:

Diagram

Real-World Example: E-Commerce Checkout Latency

Section titled “Real-World Example: E-Commerce Checkout Latency”

Company: Amazon, eBay, Shopify

Scenario: Checkout pages must load quickly to prevent cart abandonment. Even small latency increases can significantly impact conversion rates.

Implementation: Uses parallel API calls and caching:

Diagram

Throughput is the amount of work done per unit of time.

MetricDescriptionExample
RPSRequests per second10,000 RPS
TPSTransactions per second5,000 TPS
QPSQueries per second50,000 QPS
BandwidthData transferred per second1 Gbps

Throughput is calculated as:

Throughput = Total Requests / Time Period

Example: If your server handles 10,000 requests in 10 seconds, your throughput is 1,000 RPS.

Key considerations:

  • Measure over time - instantaneous measurements fluctuate
  • Track success rate - failed requests count against effective throughput
  • Monitor under load - throughput often drops when the system is stressed

They’re related but independent:

Diagram

A fundamental relationship:

Average Concurrent Requests = Throughput × Average Latency

Example:

  • Throughput: 1,000 RPS
  • Average Latency: 100ms = 0.1 seconds
  • Concurrent Requests: 1,000 × 0.1 = 100 requests in flight

Your code decisions directly affect latency and throughput:

Diagram

Caching is the most effective way to reduce latency. The idea is simple: store frequently accessed data closer to where it’s needed.

Diagram

With this caching strategy:

LevelLatencyHit Rate
L1 (Local)0.01ms90%
L2 (Redis)2ms9%
Database30ms1%

Average latency = 0.9(0.01) + 0.09(2) + 0.01(30) = 0.49ms

That’s a 60x improvement over hitting the database every time!


MetricWhat It Tells YouAction Threshold
P50Typical user experienceBaseline for “normal”
P95Most users’ worst caseWatch for drift
P99Outlier experienceInvestigate if > 3x P50
Error RateSystem healthAlert if > 1%

In Production:

  • APM Tools: Datadog, New Relic, Dynatrace
  • Metrics: Prometheus + Grafana
  • Distributed Tracing: Jaeger, Zipkin

In Development:

  • Profilers: cProfile (Python), JProfiler (Java)
  • Benchmarking: pytest-benchmark, JMH

Example 1: Google Search Latency Optimization

Section titled “Example 1: Google Search Latency Optimization”

Company: Google

Scenario: Google Search must return results in milliseconds. Even 100ms delay can reduce user satisfaction and search volume.

Implementation: Uses parallel processing and caching:

Diagram

Why This Matters:

  • Scale: Billions of queries per day
  • Latency Impact: 100ms delay = 0.2% search volume reduction
  • Performance: Parallel processing reduces latency by 60%
  • Result: Sub-100ms average response time

Real-World Impact:

  • Queries: 8.5+ billion searches per day
  • Latency: Average 50ms, P99 200ms
  • Revenue Impact: Every 100ms delay costs millions in ad revenue

Company: Amazon

Scenario: Product pages must load quickly. Amazon found that every 100ms delay reduces sales by 1%.

Implementation: Uses parallel API calls and edge caching:

Diagram

Why Parallel Processing?

  • User Experience: Fast page loads increase conversions
  • Revenue Impact: 100ms delay = 1% sales reduction
  • Performance: Parallel calls reduce latency by 70%
  • Result: 200ms average page load time

Real-World Impact:

  • Scale: Billions of product page views daily
  • Latency: 200ms average, P99 500ms
  • Revenue: Every 100ms optimization worth millions annually

Example 3: Netflix Video Streaming Latency

Section titled “Example 3: Netflix Video Streaming Latency”

Company: Netflix

Scenario: Video playback must start quickly. Users expect playback to begin within 2 seconds of clicking play.

Implementation: Uses CDN distribution and adaptive bitrate streaming:

Diagram

Why Low Latency Matters:

  • User Experience: Fast playback start increases engagement
  • Retention: Slow start causes users to abandon
  • Performance: CDN reduces latency by 90%
  • Result: < 2 seconds time to first frame

Real-World Impact:

  • Scale: 200+ million subscribers, billions of plays daily
  • Latency: < 2 seconds time to first frame
  • Engagement: Fast playback increases watch time by 20%


Now that you understand performance metrics, let’s learn how to identify and fix performance issues:

Next up: Understanding Bottlenecks - Learn to find and eliminate performance bottlenecks.