Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Serverless Architecture

Focus on code, not infrastructure - pay only for what you use

Imagine you want to bake a cake:

  • Traditional (Servers): Buy an oven, maintain it, keep it running 24/7 even when not baking. Pay for electricity always.
  • Serverless: Use a communal kitchen. Pay only when you bake. Someone else maintains the oven. You just bring your recipe!

That’s serverless - you write code (recipe), cloud provider handles servers (kitchen), you pay only when code runs (when baking)!

Real-World Analogy: Netflix vs Traditional Video Rental

Section titled “Real-World Analogy: Netflix vs Traditional Video Rental”

Traditional Architecture (Blockbuster):

  • Rent a physical store (server) 24/7
  • Pay rent even when closed
  • Hire staff to manage inventory
  • Limited capacity - need bigger store for more customers
  • You handle everything: security, maintenance, scaling

Serverless Architecture (Netflix):

  • No physical store needed
  • Pay only when someone watches a video
  • Netflix handles all infrastructure
  • Automatically scales to millions of viewers
  • You just provide the content (code)

Function as a Service (FaaS):

  • Deploy individual functions (not full applications)
  • Functions execute in response to events
  • Examples: AWS Lambda, Google Cloud Functions, Azure Functions

Backend as a Service (BaaS):

  • Managed backend services (databases, authentication, storage)
  • No code needed - just configuration
  • Examples: Firebase, AWS Amplify, Supabase

Serverless = FaaS + BaaS - You write functions (FaaS) that use managed services (BaaS)!


Diagram

No Server Management

Zero server administration. No provisioning, patching, or scaling infrastructure. Focus 100% on code.

Auto-Scaling

Automatically scales from 0 to millions of requests. Handle traffic spikes without pre-provisioning.

Pay-Per-Use

Pay only for actual compute time. No idle server costs. $0 when not running.

Fast Iteration

Deploy functions in seconds. No infrastructure changes. Quick experiments and MVPs.


First invocation takes longer due to initialization.

Impact:

  • Python/Node.js: 100-500ms
  • Java/.NET: 1-3 seconds
  • Go: 50-200ms (fastest!)

Real-World Impact:

  • User-facing APIs: First request after idle period is slow
  • Batch jobs: Initial function takes longer, subsequent ones are fast
  • Critical systems: May need to keep functions warm

Mitigation Strategies:

  • Provisioned Concurrency: Pre-warm functions (AWS Lambda)
  • Keep Functions Warm: Ping function every 5 minutes
  • Use Go/Rust: Faster cold starts than Java/.NET
  • Optimize Package Size: Smaller packages = faster initialization

Example Cost: Provisioned concurrency costs ~$0.015/hour per GB, but eliminates cold starts


PlatformMax TimeUse Case Impact
AWS Lambda15 minutesMost batch jobs OK, long ETL fails
Google Cloud Functions9 minutesShorter batch windows
Azure Functions10 minutesConsumption plan: 10 min, Premium: unlimited

Real-World Impact:

  • Video Processing: Can’t process long videos in single function
  • Data Migration: Large database migrations may timeout
  • ML Inference: Long-running models may exceed limits

Solutions:

  • Step Functions: Chain multiple functions for longer workflows
  • Containers: Use AWS Fargate or Azure Container Instances
  • Split Processing: Break work into smaller chunks

Tight coupling to cloud provider APIs.

Examples of Lock-In:

  • AWS Lambda uses AWS SDK, S3, DynamoDB
  • Azure Functions use Azure Storage, Cosmos DB
  • Google Cloud Functions use GCP services

Real-World Impact:

  • Migration Cost: Expensive to switch providers
  • Skill Requirements: Team needs provider-specific knowledge
  • Portability: Hard to run locally or on-premises

Mitigation:

  • Serverless Framework: Abstract provider differences
  • Terraform: Infrastructure as code for portability
  • Multi-Cloud: Use services available on multiple providers
  • Abstraction Layer: Build adapter layer over provider APIs

  • No SSH Access: Can’t log into running function
  • Distributed Tracing: Need tools like AWS X-Ray, Datadog
  • Local Testing: Hard to replicate exact environment
  • Log Aggregation: Logs scattered across invocations

Real-World Impact:

  • Debugging Production Issues: More difficult than traditional servers
  • Performance Tuning: Harder to profile and optimize
  • Error Investigation: Need good logging and monitoring

Solutions:

  • Structured Logging: Use JSON logs, centralized logging (CloudWatch, Stackdriver)
  • Distributed Tracing: AWS X-Ray, OpenTelemetry
  • Local Testing: SAM CLI, Serverless Framework, Docker
  • Monitoring: CloudWatch, Datadog, New Relic

The Math:

  • Low Traffic: Serverless is cheaper (pay per request)
  • High Traffic: Traditional servers become cheaper

Example Cost Comparison:

Scenario: API handling 10 million requests/month, 200ms average execution

Serverless (AWS Lambda):

  • 10M requests × $0.20 per 1M requests = $2.00
  • Compute: 10M × 0.2s × 512MB = ~$8.33
  • Total: ~$10.33/month

Traditional (EC2 t3.medium):

  • Instance: $0.0416/hour × 730 hours = $30.37/month
  • Total: $30.37/month

But at 100M requests/month:

  • Serverless: ~$100/month
  • Traditional: Still $30.37/month (if instance can handle it)

Break-Even Point: Usually around 50-100M requests/month, depending on execution time


Functions are stateless - each invocation is independent.

Challenges:

  • Can’t maintain connections (database pools, WebSockets)
  • No in-memory caching between invocations
  • Session management requires external storage

Solutions:

  • External State: Use Redis, DynamoDB, ElastiCache
  • Connection Pooling: Use RDS Proxy, connection pooling services
  • Stateless Design: Design functions to be truly stateless

Example 1: Image Processing Pipeline (Instagram/Imgur)

Section titled “Example 1: Image Processing Pipeline (Instagram/Imgur)”

Problem: Users upload millions of images daily. Need to:

  • Resize images (thumbnails, different sizes)
  • Apply filters/effects
  • Generate metadata
  • Store in CDN

Serverless Solution:

# AWS Lambda function triggered by S3 upload
def lambda_handler(event, context):
# Event: New image uploaded to S3
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
# Download original image
original = s3.get_object(Bucket=bucket, Key=key)
# Generate multiple sizes
sizes = [(200, 200), (400, 400), (800, 800)]
for width, height in sizes:
resized = resize_image(original, width, height)
s3.put_object(
Bucket=bucket,
Key=f"resized/{width}x{height}/{key}",
Body=resized
)
# Extract metadata
metadata = extract_metadata(original)
# Store in database
dynamodb.put_item(
TableName='images',
Item={'id': key, 'metadata': metadata}
)
return {'statusCode': 200}

Why Serverless?

  • Handles traffic spikes (viral image = millions of uploads)
  • Pay only for processing time
  • Auto-scales to thousands of concurrent uploads
  • No idle costs when no uploads

Real-World: Instagram processes billions of images this way!


Example 2: Real-Time Data Processing (Netflix Recommendations)

Section titled “Example 2: Real-Time Data Processing (Netflix Recommendations)”

Problem: Process user viewing events to update recommendations in real-time.

Serverless Solution:

# Lambda triggered by Kinesis stream
def process_viewing_event(event, context):
for record in event['Records']:
viewing_data = json.loads(record['body'])
# Update user profile
update_user_preferences(
user_id=viewing_data['userId'],
movie_id=viewing_data['movieId'],
watch_time=viewing_data['duration']
)
# Recalculate recommendations
recommendations = calculate_recommendations(
viewing_data['userId']
)
# Store in cache
redis.setex(
f"recommendations:{viewing_data['userId']}",
3600, # 1 hour TTL
json.dumps(recommendations)
)

Why Serverless?

  • Processes millions of events per second
  • Auto-scales with viewing spikes (new show releases)
  • Cost-effective: pay per event processed
  • No infrastructure management

Real-World: Netflix processes 500+ billion events daily using serverless!


Problem: Handle search requests with variable load (peak during weekends/holidays).

Serverless Solution:

# API Gateway → Lambda
def search_properties(event, context):
query_params = event['queryStringParameters']
# Parse search criteria
location = query_params.get('location')
check_in = query_params.get('checkIn')
guests = int(query_params.get('guests', 1))
# Search database
results = dynamodb.query(
TableName='properties',
IndexName='location-index',
KeyConditionExpression='location = :loc',
FilterExpression='capacity >= :guests',
ExpressionAttributeValues={
':loc': location,
':guests': guests
}
)
# Filter by availability (check another service)
available = filter_available_properties(
results['Items'],
check_in
)
return {
'statusCode': 200,
'body': json.dumps({
'results': available,
'count': len(available)
})
}

Why Serverless?

  • Handles 10x traffic spikes during peak seasons
  • Zero cost during low-traffic periods
  • Auto-scales to handle millions of searches
  • Fast deployment of new features

Real-World: Airbnb uses serverless for search, booking, and payment processing!


Example 4: Scheduled Tasks (Daily Reports)

Section titled “Example 4: Scheduled Tasks (Daily Reports)”

Problem: Generate daily analytics reports, send emails, cleanup old data.

Serverless Solution:

# CloudWatch Events → Lambda (runs daily at 2 AM)
def daily_maintenance(event, context):
# Generate analytics report
report = generate_analytics_report()
# Send to stakeholders
ses.send_email(
Destination={'ToAddresses': ['[email protected]']},
Message={
'Subject': {'Data': f'Daily Report - {date.today()}'},
'Body': {'Html': {'Data': format_report(report)}}
}
)
# Cleanup old data
cleanup_old_records(days=30)
# Backup database
create_backup()
return {'statusCode': 200}

Why Serverless?

  • No need to maintain cron servers
  • Pay only for execution time (few seconds)
  • Automatic retries on failure
  • Easy to modify schedule

Example 5: Webhook Handler (Stripe Payments)

Section titled “Example 5: Webhook Handler (Stripe Payments)”

Problem: Process payment webhooks from Stripe (variable load based on sales).

Serverless Solution:

# API Gateway → Lambda (webhook endpoint)
def stripe_webhook(event, context):
# Verify webhook signature
signature = event['headers']['stripe-signature']
payload = event['body']
try:
webhook = stripe.Webhook.construct_event(
payload, signature, webhook_secret
)
except ValueError:
return {'statusCode': 400}
# Handle different event types
event_type = webhook['type']
if event_type == 'payment_intent.succeeded':
order_id = webhook['data']['object']['metadata']['order_id']
fulfill_order(order_id)
elif event_type == 'payment_intent.failed':
order_id = webhook['data']['object']['metadata']['order_id']
notify_payment_failure(order_id)
return {'statusCode': 200}

Why Serverless?

  • Handles payment spikes (Black Friday, sales)
  • Critical reliability (payments must be processed)
  • Auto-scaling ensures no dropped webhooks
  • Cost-effective for variable payment volume

  1. Variable workloads - traffic spikes, seasonal patterns

    • Example: E-commerce during holidays, event ticketing systems
  2. Event-driven tasks - file uploads, webhooks, streams

    • Example: Image processing, payment webhooks, IoT data ingestion
  3. Short-running operations - API requests, data transformation

    • Example: REST APIs, data ETL pipelines, real-time analytics
  4. Rapid prototyping - MVPs, experiments

    • Example: Startup MVPs, proof-of-concepts, hackathons
  5. Low-to-medium traffic - cost-effective at scale

    • Example: Internal tools, admin dashboards, microservices
  6. Scheduled tasks - cron jobs, periodic maintenance

    • Example: Daily reports, data cleanup, backups
  1. Long-running processes - video rendering, ML training

    • Reason: Execution time limits (15 min max on AWS Lambda)
    • Alternative: Use containers or dedicated compute
  2. Latency-critical - sub-millisecond requirements

    • Reason: Cold starts add 100ms-3s latency
    • Alternative: Keep functions warm or use traditional servers
  3. Consistent high load - traditional servers cheaper

    • Reason: At scale, reserved instances are more cost-effective
    • Example: High-traffic APIs with steady load
  4. Heavy state - long-lived connections

    • Reason: Functions are stateless, short-lived
    • Alternative: Use WebSockets on traditional servers
  5. Special hardware - custom GPUs, kernels

    • Reason: Serverless uses standard runtime environments
    • Alternative: Use GPU instances or specialized compute

Pattern 1: API Gateway + Lambda (REST API)

Section titled “Pattern 1: API Gateway + Lambda (REST API)”

Architecture:

Client → API Gateway → Lambda → DynamoDB

Use Case: RESTful APIs, mobile backends

Example: E-commerce product API

  • API Gateway handles routing, authentication, rate limiting
  • Lambda functions handle business logic
  • DynamoDB stores product data

Benefits:

  • Auto-scaling API
  • Pay per API call
  • Built-in authentication (Cognito, API keys)

Architecture:

S3 Upload → Lambda → SQS → Lambda → DynamoDB

Use Case: File processing, data pipelines

Example: Image upload pipeline

  1. User uploads image to S3
  2. S3 triggers Lambda (resize, validate)
  3. Lambda publishes to SQS
  4. Another Lambda processes queue (generate thumbnails)
  5. Store metadata in DynamoDB

Benefits:

  • Decoupled processing
  • Retry on failure (SQS)
  • Parallel processing

Architecture:

CloudWatch Events → Lambda → External Services

Use Case: Daily reports, data cleanup, backups

Example: Daily analytics report

  • CloudWatch Events triggers Lambda daily at 2 AM
  • Lambda queries database, generates report
  • Sends email via SES

Benefits:

  • No cron server needed
  • Automatic retries
  • Easy to modify schedule

Architecture:

External Service → API Gateway → Lambda → Database

Use Case: Payment processing, third-party integrations

Example: Stripe webhook handler

  • Stripe sends payment webhook to API Gateway
  • Lambda verifies signature, processes payment
  • Updates order status in database

Benefits:

  • Handles traffic spikes
  • Reliable processing
  • Auto-scaling

Architecture:

Kinesis/Kafka → Lambda → DynamoDB/Elasticsearch

Use Case: Real-time analytics, event processing

Example: User activity tracking

  • User actions stream to Kinesis
  • Lambda processes events (aggregate, transform)
  • Store in DynamoDB for real-time queries
  • Index in Elasticsearch for search

Benefits:

  • Real-time processing
  • Handles high throughput
  • Auto-scaling

Pay for What You Use

Zero cost when idle. Perfect for variable workloads. Can be expensive for consistent high traffic (break-even ~50-100M requests/month).

Event-Driven by Nature

Built for event-driven architectures. Responds to triggers automatically. Natural fit for modern apps (file uploads, webhooks, streams).

Trade-offs Exist

Cold starts (100ms-3s), execution limits (9-15 min), vendor lock-in, debugging challenges. Not a silver bullet. Choose wisely based on use case.

Focus on Business Logic

No infrastructure management. Faster time-to-market. Perfect for startups, MVPs, and rapid prototyping. Used by Netflix, Airbnb, Instagram at scale.

Real-World Proven

Powers billions of requests daily at companies like Netflix (500B+ events), Airbnb (search/booking), Instagram (image processing). Battle-tested at massive scale.

Pattern-Based Design

Common patterns: API Gateway + Lambda, Event-driven processing, Scheduled tasks, Webhook handlers, Stream processing. Each solves specific problems.



  • “Serverless Architectures on AWS” by Peter Sbarski
  • “Serverless Design Patterns” by Brian Zambrano
  • AWS Lambda Documentation - Official comprehensive guide
  • “Building Serverless Applications” - Practical patterns
  • ServerlessLand.com - Patterns, examples, and best practices