Start with Monolith
Don’t start with microservices. Prove your product first, then evolve architecture based on real constraints.
Microservices architecture decomposes an application into a collection of small, independently deployable services. Each service runs in its own process, communicates via lightweight mechanisms (usually HTTP APIs or message queues), and can be developed, deployed, and scaled independently.
Think of it like building a castle from many small, independent pieces that work together but can be changed, moved, or replaced individually. Each piece (microservice) has one specific job. If the drawbridge breaks, the kitchen still works. This independence is the key benefit of microservices.
Netflix started as a monolith serving DVDs by mail. When they moved to streaming, their monolith couldn’t scale. They migrated to microservices in 2008-2009, and now run 1000+ microservices on 100,000+ AWS instances serving 200+ million subscribers.
Why they evolved: Database scalability issues forced them to decompose. They couldn’t scale their monolithic database, so they split into services with their own databases. This evolution took years and required significant infrastructure investment.
Key lesson: Netflix didn’t start with microservices. They started with a monolith and evolved when they hit real constraints. This is the recommended path—start simple, evolve when needed.
Each microservice should do one thing well. This is the Single Responsibility Principle applied at the service level. A service should have one reason to change—if you need to change it for multiple unrelated reasons, it’s probably too large.
Bad example: A service that handles orders, payments, shipping, and notifications. This violates single responsibility—changes to payment logic affect shipping, and vice versa. The service becomes hard to understand, test, and maintain.
Good example: Separate services for each responsibility. OrderService manages orders only. PaymentService handles payments only. Each service has a clear, focused responsibility. Changes to payment logic don’t affect order logic.
Each service can be deployed independently without affecting other services. This is one of the most important benefits of microservices—you can deploy changes to one service without coordinating with other teams or risking other services.
Real-world example: You deploy a new version of the Payment Service with bug fixes. The Order Service and User Service continue running unchanged. There’s no coordination needed, no shared deployment window, and no risk of breaking other services. This independence enables teams to move fast.
Each service has its own database. This is critical for service independence—services don’t share databases. Services communicate via APIs, not by accessing each other’s databases directly.
Each team owns their service end-to-end. They choose their own technology stack, database, architectural decisions, and deployment schedule. This autonomy enables teams to use the best tool for their specific problem.
Real-world example: Netflix uses different technologies for different services. Java Spring Boot for core business services, Node.js for API gateway, Python for machine learning, Go for high-performance streaming. Each team chooses what works best for their problem domain.
Scale services independently based on their load. If the Order Service needs 10 instances but the Admin Service only needs 1, you can scale them independently. This saves resources and money compared to scaling the entire application.
Use the right tool for the job. Different services can use different technologies based on their requirements. This polyglot approach allows teams to choose the best technology for their specific problem.
Real-world example: Netflix uses Java Spring Boot for core business services, Node.js for API gateway (fast I/O), Python for machine learning recommendations, and Go for high-performance streaming services. Each team chooses what works best for their problem domain.
One service failure doesn’t bring down the entire system. With proper isolation and circuit breakers, failures are contained to individual services. Other services continue operating, providing partial functionality.
With graceful degradation: Orders still work (view, create, update). Payments are queued for later processing. Notifications are still sent. The system remains partially functional instead of completely failing. This provides a better user experience than a complete outage.
Teams can work independently without stepping on each other’s toes. Each team owns their service end-to-end, enabling faster development cycles and reducing coordination overhead.
| Aspect | Monolith | Microservices |
|---|---|---|
| Deployment | Coordinate with everyone | Deploy independently |
| Technology | Everyone uses same stack | Choose your own stack |
| Database | Shared, coordinate schema changes | Own your schema |
| Testing | Wait for full integration tests | Test your service in isolation |
| Ownership | Blurred boundaries | Clear ownership |
Instead of understanding a 1M line monolith, understand a 10K line service. This reduces cognitive load and makes onboarding faster. New engineers can become productive faster on individual services.
Cognitive load comparison:
Real-world impact: A new engineer joining a monolith team might take months to understand the codebase. A new engineer joining a microservices team can understand one service in days and become productive quickly.
Microservices trade code complexity for operational complexity. You’re not reducing complexity—you’re moving it to a different place. Understanding this trade-off is crucial before choosing microservices.
Microservices are distributed systems, and distributed systems are hard. The network is unreliable, latency exists, and failures happen in ways that don’t occur in monolithic systems.
The network is unreliable: In a monolith, a function call either works or throws an exception. In microservices, network calls can fail in many ways: network down, service down, request timeout, response corrupted, or payment processed but response lost. This creates uncertainty that doesn’t exist in monoliths.
Real-world example: An Order Service calls a Payment Service. The request times out. Did the payment process or not? This is the “Two Generals Problem”—you can’t know for certain. You need to implement idempotency, retries, and compensation logic to handle these cases.
These fallacies describe assumptions developers make about distributed systems that are often wrong:
Understanding these fallacies helps you design better microservices architectures.
No more ACID transactions across services. In a monolith, you can perform multiple operations in a single database transaction with ACID guarantees. In microservices, each service has its own database, so cross-service transactions require distributed transactions or eventual consistency patterns.
Monolith approach: One database transaction with ACID guarantees. All operations succeed or all fail atomically. Simple and reliable.
Microservices approach: Three service calls with no atomic transaction. If any step fails, you need to compensate (undo) previous steps. This requires implementing the Saga Pattern or accepting eventual consistency.
Real-world example: Creating an order requires creating the order, processing payment, and reserving inventory. In a monolith, this is one transaction. In microservices, if inventory reservation fails, you need to refund the payment and cancel the order. This compensation logic is complex and error-prone.
Solution: Use the Saga Pattern for distributed transactions or accept eventual consistency. Both approaches add complexity compared to monolithic transactions.
Testing microservices is more complex than testing monoliths. You need to spin up multiple services for integration tests, manage test data across services, and handle service dependencies.
Challenges: You need to spin up multiple services for integration tests. End-to-end tests require the entire ecosystem. Mocking service dependencies is complex. Test data management across services is difficult. These challenges slow down development and make testing less reliable.
Microservices require significant operational infrastructure. Each service needs its own deployment pipeline, monitoring, logging, and database management. This overhead multiplies with the number of services.
| Task | Monolith | 10 Microservices | 100 Microservices |
|---|---|---|---|
| Deployment pipeline | 1 | 10 | 100 |
| Monitoring dashboards | 1 | 10 | 100 |
| Log aggregation | 1 source | 10 sources | 100 sources |
| Databases to manage | 1 | 10 | 100 |
| Security patches | 1 app | 10 apps | 100 apps |
| Incident response | 1 service down | Which of 10? | Which of 100? |
Required infrastructure: Service discovery (Consul, Eureka), API Gateway (Kong, NGINX), message broker (Kafka, RabbitMQ), distributed tracing (Jaeger, Zipkin), centralized logging (ELK, Splunk), service mesh (Istio, Linkerd), and container orchestration (Kubernetes). This infrastructure requires expertise to set up and maintain, adding operational overhead.
Debugging distributed systems is significantly harder than debugging monoliths. When something goes wrong, you need to trace requests across multiple services, check logs from multiple sources, and understand distributed traces.
Real-world scenario: User reports “checkout is slow”. In a monolith, you check logs, find the slow database query, and fix it—takes 10 minutes. In microservices, you check API Gateway logs, then Order Service logs, then Payment Service logs, then Inventory Service logs, then User Service logs, find the slow service, check its dependencies, and finally discover from distributed traces that Payment Service is calling an external API slowly—takes 2 hours.
The solution: Invest heavily in observability—distributed tracing, centralized logging, and comprehensive monitoring. Without these tools, debugging microservices is nearly impossible.
Microservices introduce network latency that doesn’t exist in monoliths. Function calls in a monolith take microseconds. Network calls between services take milliseconds. This latency compounds across multiple service calls, significantly impacting performance.
Performance Comparison:
Understanding when to choose microservices is crucial. Many teams choose microservices too early, adding complexity without benefits. Choose microservices when you have real constraints that justify the complexity.
Large, Mature Organization
Proven Product with Clear Boundaries
Different Scaling Requirements
Team Autonomy is Critical
You Have the Infrastructure and Expertise
Starting a New Product
Small Team (< 20 engineers)
Strong Consistency Requirements
Limited DevOps Capabilities
Finding the right service boundaries is one of the hardest parts of microservices architecture. Boundaries that are too small create unnecessary complexity. Boundaries that are too large defeat the purpose of microservices. Use Domain-Driven Design principles to guide decomposition.
Group by business capabilities, not technical layers. Services should represent business domains, not technical concerns.
Bad (technical boundaries): Grouping by technical layers creates services that don’t represent business domains. UserService, ProductService, DatabaseService, NotificationService—these are technical concerns, not business capabilities.
Good (business boundaries): Grouping by business capabilities creates services that represent domains. OrderManagement handles everything about orders. InventoryManagement handles everything about inventory. CustomerManagement handles everything about customers. PaymentProcessing handles everything about payments. These boundaries align with how the business thinks about the system.
Each service represents a bounded context with its own domain model, ubiquitous language, and business rules. The same entity can have different representations in different contexts—this is key to understanding microservices boundaries.
Key insight: The same entity can have different representations in different contexts. The Order Context only needs customer ID, name, and shipping address. The Customer Context needs complete customer information including email, phone, addresses, payment methods, and preferences. This difference is natural and correct—each context only needs what’s relevant to its domain.
“Micro” doesn’t mean small in code size. It means focused responsibility (does one thing), independently deployable, and bounded context. A microservice can be 100 lines or 10,000 lines. Size is not the point—responsibility and independence are.
Services communicate through APIs. Understanding communication patterns is crucial for designing microservices. There are two main approaches: synchronous (request-response) and asynchronous (event-driven).
Synchronous communication uses request-response patterns. The caller waits for the response before continuing. This is simple to understand but creates tight coupling and can cause cascading failures.
Pros: Simple to understand, immediate response, easy to debug. The caller gets an immediate result, making the flow straightforward.
Cons: Tight coupling (caller waits for response), cascading failures (if payment service is down, order service fails), timeout management complexity. These issues make synchronous communication problematic for critical paths.
Asynchronous communication uses message queues or event buses. Services publish events without waiting for responses. Other services consume events and process them independently. This decouples services and improves fault tolerance.
Pros: Loose coupling (services don’t wait for each other), fault tolerance (messages stored until consumed), natural retry mechanism, better scalability. Services can continue operating even when other services are down.
Cons: Eventual consistency (no immediate guarantees), harder to debug (asynchronous flow), message ordering challenges, complexity in error handling. These trade-offs require careful design.
Netflix serves 200+ million subscribers with 1000+ microservices running on 100,000+ AWS instances. They didn’t start this way—they evolved from a monolith when they hit database scalability issues in 2008-2009.
Key practices: Chaos Engineering (intentionally breaking services to test resilience), API-first design (services communicate through well-defined APIs), automated deployment pipelines (4000 deployments per day), and strong observability culture (comprehensive monitoring and tracing).
Key lesson: “We don’t have a single deployment. We have about 4,000 deployments per day. Each team deploys independently.” This independence enables rapid innovation but requires significant infrastructure investment.
Uber runs 2,000+ microservices on 50,000+ production servers using a polyglot architecture (Go, Java, Python, Node.js). This scale creates unique challenges.
Challenges they faced: Service discovery at scale (solution: built internal service mesh), cascading failures (solution: circuit breakers everywhere), and debugging distributed traces (solution: built Jaeger, now open source). These challenges required building custom infrastructure to manage complexity at scale.
In 2002, Jeff Bezos issued a mandate: “All teams will henceforth expose their data and functionality through service interfaces. Teams must communicate with each other through these interfaces. There will be no other form of interprocess communication allowed. Anyone who doesn’t do this will be fired.”
The result: This forced SOA (Service-Oriented Architecture) led to AWS (Amazon Web Services) and enabled massive scale and innovation. Amazon’s approach showed that forcing service boundaries can drive architectural evolution, but it requires strong leadership and organizational commitment.
Services communicate through APIs, so API contracts are critical. Version your APIs to allow evolution without breaking consumers. Support multiple versions simultaneously and deprecate old versions gradually.
Every service should expose health endpoints. Load balancers and orchestrators use these to determine if services are healthy and ready to accept traffic. This is critical for automatic failover and zero-downtime deployments.
@app.get("/health/live")async def liveness(): """Is the service running?""" return {"status": "ok"}
@app.get("/health/ready")async def readiness(): """Is the service ready to accept traffic?""" # Check database connection if not await database.is_connected(): raise HTTPException(status_code=503, detail="Database not ready")
# Check dependent services if not await payment_service.is_available(): raise HTTPException(status_code=503, detail="Payment service unavailable")
return {"status": "ready"}Observability is critical for microservices. You need structured logging, metrics, and distributed tracing to understand what’s happening across services. Without observability, debugging microservices is nearly impossible.
Structured logging:
import structlog
logger = structlog.get_logger()
async def create_order(order_id: str, user_id: str): logger.info( "order.create.started", order_id=order_id, user_id=user_id, service="order-service" )
# ... business logic ...
logger.info( "order.create.completed", order_id=order_id, duration_ms=duration, service="order-service" )Distributed tracing:
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
async def create_order(order_id: str): with tracer.start_as_current_span("order.create") as span: span.set_attribute("order.id", order_id)
# Call payment service - trace continues! with tracer.start_as_current_span("payment.process"): await payment_client.process(order_id)Circuit breakers prevent cascading failures by stopping requests to failing services. When a service fails repeatedly, the circuit opens, rejecting requests immediately. This protects the system from cascading failures.
Start with Monolith
Don’t start with microservices. Prove your product first, then evolve architecture based on real constraints.
Trade-offs Matter
Microservices trade code complexity for operational complexity. Be prepared for distributed system challenges.
Team Size Matters
Microservices work best with large teams (50+). Small teams benefit more from well-designed monoliths.
Boundaries are Hard
Finding the right service boundaries is hard. Use Domain-Driven Design principles to guide decomposition.