Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Service Mesh & Sidecar Pattern

Infrastructure concerns solved at the platform level

A service mesh is a dedicated infrastructure layer for managing service-to-service communication in microservices architectures. It handles traffic management, security, and observability without requiring changes to your application code. Think of it as city-wide infrastructure for your microservices—instead of every building having its own phone system, security guards, and mail delivery, you build shared infrastructure that handles all of this automatically.

Diagram

Real-World Scenario: The Infrastructure Code Problem

Section titled “Real-World Scenario: The Infrastructure Code Problem”

In microservices architectures without a service mesh, every service must implement infrastructure concerns: circuit breakers, retries and timeouts, load balancing, service discovery, metrics collection, distributed tracing, mutual TLS encryption, and rate limiting. This results in 30-40% of your code being infrastructure concerns, not business logic.

The problem: Teams spend significant time implementing and maintaining the same infrastructure code across multiple services. This code is error-prone, inconsistent, and takes focus away from business logic. When you need to update retry logic or add new observability features, you must update every service individually.

Diagram

The solution: Service mesh moves infrastructure concerns to a dedicated layer. Your services focus entirely on business logic. The service mesh handles all infrastructure concerns consistently across all services. This separation of concerns improves code quality, reduces bugs, and enables faster feature development.

Diagram

The sidecar pattern is a deployment pattern where a companion container runs alongside your main application container, providing auxiliary functionality. The sidecar shares the same network namespace and lifecycle as the main container but handles different concerns.

Diagram

Service mesh architecture consists of two main components: the data plane and the control plane. Understanding this separation is crucial for understanding how service mesh works.

Diagram

Key components:

  1. Control Plane: Manages configuration, policies, and telemetry. It tells the data plane how to route traffic, what security policies to apply, and what metrics to collect. Examples: Istio Pilot, Linkerd control plane.

  2. Data Plane: Network of sidecar proxies handling actual traffic. Each service pod has a sidecar proxy (usually Envoy) that intercepts all inbound and outbound traffic. The proxies enforce policies configured by the control plane.

  3. Sidecar Proxy: Usually Envoy—intercepts all inbound/outbound traffic. Applications communicate with the proxy via localhost, and the proxy handles all network concerns transparently.

Diagram

Several service mesh solutions exist, each with different characteristics. Understanding the options helps you choose the right one for your needs.

Diagram

Characteristics:

  • Uses Envoy as sidecar proxy
  • Comprehensive feature set
  • Kubernetes-native
  • Complex but powerful

Components:

  • Pilot: Traffic management
  • Citadel: Security and certificate management
  • Galley: Configuration management
  • Envoy: Sidecar proxy (data plane)

Characteristics:

  • Lightweight and simple
  • Written in Rust (fast and secure)
  • Easier to adopt than Istio
  • Kubernetes-only

Characteristics:

  • Works beyond Kubernetes
  • Multi-platform (VMs, containers, serverless)
  • Integrated with Consul service discovery
FeatureIstioLinkerdConsul Connect
ComplexityHighLowMedium
Performance Overhead10-15ms5-10ms10-15ms
Memory UsageHighLowMedium
PlatformKubernetesKubernetesMulti-platform
Learning CurveSteepGentleMedium
MaturityVery MatureMatureMature
CommunityLargestGrowingStrong

# Istio VirtualService - 80% to v1, 20% to v2
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: payment-service
spec:
hosts:
- payment-service
http:
- route:
- destination:
host: payment-service
subset: v1
weight: 80
- destination:
host: payment-service
subset: v2
weight: 20

What this gives you:

  • Canary deployments (route 5% traffic to new version)
  • A/B testing
  • Blue-green deployments
  • Gradual rollouts
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: order-service
spec:
hosts:
- order-service
http:
- route:
- destination:
host: order-service
timeout: 5s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,reset,connect-failure

No code changes needed! The sidecar proxy handles all retries automatically.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: payment-service-circuit-breaker
spec:
host: payment-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
maxRequestsPerConnection: 2
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s

Translation:

  • If a service instance returns 5 consecutive 5xx errors
  • Eject it from the load balancer pool for 30 seconds
  • All without modifying your application code!

Without Service Mesh: You need to implement mTLS in every service, managing certificates, keys, and trust stores. This is complex, error-prone, and requires expertise in cryptography.

With Service Mesh: Your application makes normal HTTP calls. The sidecar proxy handles mTLS automatically. Traffic between sidecars is encrypted, but your application code doesn’t need to know about certificates or encryption.

Diagram

Service mesh handles:

  • Certificate generation
  • Certificate distribution
  • Automatic rotation (every 24 hours)
  • No expired certificates!
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payment-service-policy
spec:
selector:
matchLabels:
app: payment-service
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/order-service"]
to:
- operation:
methods: ["POST"]
paths: ["/process"]

Translation: Only the Order Service can call the Payment Service’s /process endpoint.

Without any code changes, service mesh collects:

  • Request rate: Requests per second
  • Error rate: Percentage of failed requests
  • Latency: P50, P95, P99 latencies
  • Request volume: Total requests over time
Diagram

Without Service Mesh:

  • Manually instrument every service
  • Add trace IDs to headers
  • Send spans to collector

With Service Mesh:

  • Automatic trace propagation
  • Automatic span creation
  • Just forward the trace headers!

Result: Full distributed traces without heavy instrumentation!

Diagram

No need for hard-coded service URLs!

# Instead of:
PAYMENT_SERVICE_URL = "http://payment-service-prod-123.us-west-2.elb.amazonaws.com:8080"
# Just use service name:
response = await client.post("http://payment-service/process")
# Service mesh resolves the actual endpoint automatically

LLD Connection: Decorator Pattern at Infrastructure Level

Section titled “LLD Connection: Decorator Pattern at Infrastructure Level”

Service mesh is the Decorator Pattern applied to infrastructure!

The Decorator Pattern allows you to add functionality to objects dynamically without modifying their structure. Service mesh applies this pattern at the infrastructure level.

Service mesh applies the same pattern at the network level!

Diagram
  1. Many Microservices (10+)

    • Managing infrastructure concerns manually becomes impossible
    • Need consistent policies across all services
  2. Zero-Trust Security Requirements

    • Need mutual TLS everywhere
    • Fine-grained authorization policies
    • Audit trail of all service-to-service communication
  3. Complex Routing Requirements

    • Canary deployments
    • A/B testing
    • Traffic mirroring
    • Gradual rollouts
  4. Observability is Critical

    • Need distributed tracing
    • Uniform metrics collection
    • Service dependency graphs
  5. Polyglot Architecture

    • Services in different languages
    • Can’t reimplement infrastructure in each language
  1. Small Number of Services (< 5)

    • Overhead not justified
    • Libraries like Resilience4j (Java) or Tenacity (Python) sufficient
  2. Simple Architecture

    • Direct service-to-service calls work fine
    • No complex routing needs
  3. Limited Kubernetes Experience

    • Service mesh adds operational complexity
    • Need solid Kubernetes foundation first
  4. Performance is Critical

    • Service mesh adds 5-15ms latency per hop
    • For ultra-low-latency systems, this matters
  5. Small Team

    • Learning curve is steep
    • Operational burden high
    • Focus on business features instead

Problem:

  • 100+ microservices
  • Multiple languages (Python, Go, Java)
  • Reimplementing infrastructure in each service

Solution:

  • Built Envoy proxy (2016)
  • Open-sourced it (now part of CNCF)
  • Foundation for Istio, Consul Connect

Results:

  • Consistent observability
  • Simplified operations
  • Faster feature development

Before Service Mesh:

  • Manual circuit breakers in each service
  • Inconsistent retry policies
  • Difficult to debug cascading failures

After Adopting Istio:

  • Uniform traffic policies
  • Better visibility into failures
  • Reduced incident response time by 40%

Key Learning:

“We spent 6 months migrating to service mesh. The operational simplicity we gained was worth every minute.” - Airbnb Engineering


Terminal window
# Download Istio
curl -L https://istio.io/downloadIstio | sh -
# Install Istio on Kubernetes
istioctl install --set profile=demo -y
# Enable sidecar injection for namespace
kubectl label namespace default istio-injection=enabled

Step 2: Deploy Your Service (No Code Changes!)

Section titled “Step 2: Deploy Your Service (No Code Changes!)”
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
replicas: 3
template:
metadata:
labels:
app: payment-service
version: v1
spec:
containers:
- name: payment-service
image: payment-service:v1
ports:
- containerPort: 8080

When deployed to a namespace with Istio injection enabled:

  • Istio automatically injects Envoy sidecar
  • No changes to your container!
# Canary deployment: 90% to v1, 10% to v2
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: payment-service
spec:
hosts:
- payment-service
http:
- route:
- destination:
host: payment-service
subset: v1
weight: 90
- destination:
host: payment-service
subset: v2
weight: 10
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: default
spec:
mtls:
mode: STRICT # All traffic must be mTLS

Done! All service-to-service traffic is now encrypted.


ScenarioWithout MeshWith MeshOverhead
Simple request5ms10ms+5ms
With retries15ms20ms+5ms
With circuit breaker5ms10ms+5ms
mTLS handshake-15ms+15ms (once)

Typical: 5-15ms added latency per service hop

ResourcePer Sidecar
Memory50-100 MB
CPU0.1-0.5 cores

For 100 services with 3 replicas each:

  • 300 sidecars × 75 MB = 22.5 GB memory
  • 300 sidecars × 0.3 cores = 90 CPU cores

Infrastructure as Code

Service mesh moves infrastructure concerns from code to configuration. Focus on business logic, not retries and timeouts.

Decorator at Scale

Service mesh is the Decorator pattern applied to network infrastructure. Add functionality without modifying services.

Not a Silver Bullet

Service mesh adds complexity and overhead. Only adopt when you have enough services (10+) to justify the cost.

Observability for Free

Automatic metrics, tracing, and logging across all services without instrumentation. This alone can justify adoption.



  • “Istio: Up and Running” by Lee Calcote
  • “Service Mesh Patterns” by Alex Soto Bueno
  • Envoy Proxy Documentation - Deep technical details
  • Istio Documentation - Official guides and tutorials
  • “The Service Mesh Era” by William Morgan (Linkerd creator)