Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Bulkhead Pattern

Isolating resources to prevent failures from spreading
Diagram

Without bulkheads, all services share the same resource pools. When one service fails or becomes slow, it exhausts shared resources, affecting all services. A slow database query in Service A blocks threads in the shared thread pool, preventing Service B and Service C from processing requests. The failure cascades through the system.

With bulkheads, each service has isolated resource pools. When Service A fails and exhausts its thread pool, Service B and Service C continue operating with their own isolated pools. The failure is contained to one compartment, and the system remains partially functional.


Diagram

The bulkhead pattern isolates resources into separate compartments so that failure in one compartment doesn’t affect others. It’s named after ship bulkheads—watertight compartments that prevent flooding from spreading. If one compartment floods, others stay dry, and the ship stays afloat.

In software, bulkheads isolate thread pools, connection pools, memory, and CPU resources. Each service gets its own isolated resource pool. If one service fails and exhausts its resources, other services continue operating normally.


Real-World Scenario: Microservices Architecture

Section titled “Real-World Scenario: Microservices Architecture”
Diagram

In a microservices architecture, multiple services run in the same application server or container. Without bulkheads, they share thread pools, connection pools, and memory. A failure in one service can exhaust shared resources, affecting all services.

The problem: A payment service makes slow database queries that block threads. Without bulkheads, these blocked threads are in a shared pool, preventing other services (user service, order service) from processing requests. The entire system slows down or crashes.

The solution: Bulkhead pattern. Each service gets its own thread pool and connection pool. When the payment service’s threads are blocked, the user service and order service continue operating with their own pools. The failure is isolated, and the system remains partially functional.

The impact: Services become resilient to failures in other services. A slow payment service doesn’t affect user authentication or order processing. The system degrades gracefully instead of failing completely.


Thread pool isolation is the most common bulkhead implementation. Each service gets its own thread pool with a fixed size. If one service’s threads are blocked, other services’ threads continue working.

Diagram

How it works: Instead of a single shared thread pool, create separate thread pools for each service. Service A gets 30 threads, Service B gets 30 threads, Service C gets 30 threads. If Service A’s threads are blocked, Service B and C continue processing requests.

Benefits: Prevents thread exhaustion from spreading. A slow service can’t block other services. Each service has guaranteed resources. Failures are isolated to one service.

Trade-offs: More complex resource management. Need to size pools appropriately. Total thread count increases (but still bounded). Requires careful monitoring.

Real-world example: A web application has three services: user authentication, payment processing, and order management. Each service gets its own thread pool of 20 threads. When payment processing becomes slow due to database issues, user authentication and order management continue operating normally. Users can still log in and browse orders even when payments are slow.


Diagram

Connection pool isolation gives each service its own database or HTTP connection pool. If one service exhausts its connections, other services’ connections remain available.

How it works: Instead of a single shared connection pool, create separate pools for each service. Service A gets 10 database connections, Service B gets 10 connections, Service C gets 10 connections. If Service A exhausts its connections, Service B and C continue using their connections.

Benefits: Prevents connection exhaustion from spreading. A service with connection leaks can’t affect other services. Each service has guaranteed connections. Database connections are better utilized.

Trade-offs: More connections total (but still bounded). Need to size pools appropriately. Requires connection pool management. Database may have connection limits.

Real-world example: An e-commerce application has three services that access the same database: product catalog, user management, and order processing. Each service gets its own connection pool of 15 connections. When order processing has a bug that leaks connections, product catalog and user management continue operating. Users can still browse products and manage accounts even when orders are failing.


Diagram

Memory isolation allocates separate memory pools for different services or components. If one service has a memory leak, other services’ memory remains available.

How it works: Instead of shared memory, allocate separate memory pools. Service A gets 512MB, Service B gets 512MB, Service C gets 512MB. If Service A has a memory leak and exhausts its pool, Service B and C continue operating.

Benefits: Prevents memory exhaustion from spreading. A service with memory leaks can’t affect other services. Each service has guaranteed memory. Better memory utilization.

Trade-offs: More complex memory management. Need to size pools appropriately. May waste memory if pools are oversized. Requires careful monitoring.

Real-world example: A microservices application runs multiple services in the same container. Each service gets its own memory limit (e.g., 256MB). When a reporting service has a memory leak and exhausts its limit, other services (API, background jobs) continue operating. The system remains functional even when one service fails.


Thread pool isolation is the most common interview topic. Here’s how to implement it:


Diagram

Bulkhead pattern works best when combined with other resilience patterns:

With Circuit Breaker: Circuit breakers prevent requests to failing services, while bulkheads isolate resources. If a service’s circuit is open, don’t use its thread pool—fail fast instead.

With Timeouts: Set timeouts on operations to prevent threads from blocking indefinitely. Combined with bulkheads, this ensures threads are released even when operations are slow.

With Retry: Retry patterns handle transient failures, while bulkheads prevent retry storms from exhausting resources. Isolated thread pools prevent retries in one service from affecting others.

Real-world example: A microservices application uses bulkheads for thread pool isolation, circuit breakers to prevent requests to failing services, timeouts to prevent hanging operations, and retry patterns to handle transient failures. This combination provides comprehensive resilience—failures are isolated, resources are protected, and transient failures are handled automatically.


Isolate Resources

Isolate thread pools, connection pools, and memory into separate compartments. Each service gets its own resources.

Prevent Cascades

Bulkhead prevents cascade failures by containing resource exhaustion to one compartment. Other services continue operating.

Thread Pool Isolation

Most common implementation. Each service gets its own thread pool. Blocked threads in one service don’t affect others.

Connection Pool Isolation

Each service gets its own connection pool. Connection exhaustion in one service doesn’t affect others.

Size Appropriately

Size resource pools appropriately. Too small = resource exhaustion, too large = waste. Monitor and adjust.

Combine Patterns

Combine with circuit breakers, timeouts, and retry patterns for comprehensive resilience. Patterns work together.



  • “Release It!” by Michael Nygard - Bulkhead pattern and production patterns
  • “Building Microservices” by Sam Newman - Resilience patterns including bulkhead
  • Netflix Hystrix - Bulkhead implementation in Hystrix
  • Resilience4j - Modern Java resilience library with bulkhead support
  • “Designing Data-Intensive Applications” by Martin Kleppmann - Resource isolation patterns