Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Cache Invalidation

The hardest problem in computer science

“There are only two hard things in Computer Science: cache invalidation and naming things.”
— Phil Karlton

Cache invalidation is deciding when and how to remove or update cached data when the underlying data changes.

Diagram

Why is it hard?

  • When do you invalidate? (on write, on time, on events?)
  • What do you invalidate? (single key, related keys, entire cache?)
  • How do you invalidate? (delete, update, version?)
  • What if invalidation fails?

Stale data = cached data that doesn’t match the database anymore.

Diagram

Impact:

  • Users see wrong prices
  • Inventory counts are incorrect
  • User profiles show old data
  • Search results are outdated

Update cache when data is written. Cache and database stay in sync.

Diagram

How it works:

  1. Application writes data
  2. Update cache immediately
  3. Update database simultaneously
  4. Wait for both to complete
  5. Return success

When to use:

  • Strong consistency required
  • Can’t afford stale data
  • Write latency acceptable
  • Critical data (prices, inventory)

Trade-offs:

  • Cache always has latest data
  • No stale data
  • Simple to understand
  • Higher write latency (waits for both)
  • Database becomes bottleneck
write_through.py
class WriteThroughCache:
def __init__(self, cache: CacheClient, db: Database):
self.cache = cache
self.db = db
def update_user(self, user_id: int, data: dict):
# Write to cache
cache_key = f"user:{user_id}"
self.cache.set(cache_key, data, ttl=3600)
# Write to database
self.db.update_user(user_id, data)
# Both complete - return success
return True

Delete cache on write. Next read fetches fresh data.

Diagram

How it works:

  1. Application writes data
  2. Delete from cache (invalidate)
  3. Update database
  4. Return success
  5. Next read fetches fresh data from DB and caches it

When to use:

  • Simpler than write-through
  • Lower write latency (no cache write)
  • Good for write-heavy workloads
  • When cache miss is acceptable

Trade-offs:

  • Lower write latency
  • Simple to implement
  • Ensures fresh data on next read
  • Next read has cache miss (slower)
  • May cause cache stampede
write_invalidate.py
class WriteInvalidateCache:
def __init__(self, cache: CacheClient, db: Database):
self.cache = cache
self.db = db
def update_user(self, user_id: int, data: dict):
# Update database
self.db.update_user(user_id, data)
# Invalidate cache
cache_key = f"user:{user_id}"
self.cache.delete(cache_key)
return True
def get_user(self, user_id: int):
# Cache-aside pattern
cache_key = f"user:{user_id}"
cached = self.cache.get(cache_key)
if cached:
return cached
# Cache miss - fetch from DB
user = self.db.get_user(user_id)
# Cache it
if user:
self.cache.set(cache_key, user, ttl=3600)
return user

Time-based expiration. Items expire after fixed time.

Diagram

How it works:

  1. Set TTL when caching data
  2. Background process checks expiration
  3. Expired items removed automatically
  4. Next read fetches fresh data

When to use:

  • Data changes infrequently
  • Some staleness acceptable
  • Simple to implement
  • Good for read-heavy workloads

Trade-offs:

  • Simple
  • Automatic cleanup
  • No manual invalidation needed
  • May serve stale data until expiration
  • Doesn’t react to actual changes

Invalidate on data change events. Most sophisticated approach.

Diagram

How it works:

  1. Data updated in database
  2. Publish event (e.g., “user:123 updated”)
  3. Cache listens to events
  4. On relevant event, invalidate cache
  5. Can invalidate related keys too

When to use:

  • Complex invalidation logic
  • Need to invalidate related data
  • Event-driven architecture
  • Multiple cache layers

Trade-offs:

  • Precise invalidation
  • Can handle complex relationships
  • Reactive to actual changes
  • More complex to implement
  • Requires event infrastructure

Cache stampede (thundering herd) happens when cache expires and many requests simultaneously try to fetch from database.

Diagram

Impact:

  • Database overloaded with duplicate queries
  • Slow response times
  • Potential database crash
  • Wasted resources

Only one request fetches; others wait.

Diagram

How it works:

  1. Request checks cache (miss)
  2. Try to acquire distributed lock
  3. Only one request gets lock
  4. Winner fetches from database and caches
  5. Others wait, then read from cache
cache_stampede_prevention.py
import redis
import time
import uuid
class CacheWithLock:
def __init__(self, redis_client: redis.Redis):
self.redis = redis_client
self.lock_ttl = 10 # seconds
def get_with_lock(self, key: str, fetch_func):
# Try cache first
cached = self.redis.get(key)
if cached:
return cached
# Cache miss - try to acquire lock
lock_key = f"lock:{key}"
lock_value = str(uuid.uuid4())
# Try to acquire lock (set if not exists)
acquired = self.redis.set(
lock_key, lock_value,
nx=True, ex=self.lock_ttl
)
if acquired:
# We got the lock - fetch from DB
try:
data = fetch_func()
self.redis.set(key, data, ex=300)
return data
finally:
# Release lock (only if we still own it)
if self.redis.get(lock_key) == lock_value:
self.redis.delete(lock_key)
else:
# Someone else has lock - wait and retry
time.sleep(0.1)
# Retry cache (winner should have cached it)
return self.redis.get(key) or fetch_func()

Solution 2: Probabilistic Early Expiration

Section titled “Solution 2: Probabilistic Early Expiration”

Refresh cache randomly before expiration. Spreads load over time.

Diagram

How it works:

  1. Set TTL (e.g., 300 seconds)
  2. In last 10% of TTL (270-300s), randomly refresh
  3. First request in window refreshes cache
  4. Others use refreshed cache
  5. Spreads refresh load over time
"probabilistic_refresh.py
import random
import time
class ProbabilisticRefreshCache:
def __init__(self, cache: CacheClient, db: Database):
self.cache = cache
self.db = db
def get_with_refresh(self, key: str, ttl: int = 300):
cached_data = self.cache.get(key)
if cached_data:
# Check if we should refresh (last 10% of TTL)
# This is simplified - in practice, store timestamp
if random.random() < 0.1: # 10% chance
# Refresh in background
self._refresh_async(key, ttl)
return cached_data
# Cache miss - fetch and cache
data = self.db.fetch(key)
self.cache.set(key, data, ttl=ttl)
return data
def _refresh_async(self, key: str, ttl: int):
# Background refresh (simplified)
data = self.db.fetch(key)
self.cache.set(key, data, ttl=ttl)

Cache invalidation strategies vary by company based on their consistency requirements:

Write-Through: Financial Trading Platforms

Section titled “Write-Through: Financial Trading Platforms”

The Challenge: Trading platforms need real-time, accurate prices. Stale prices mean wrong trades and financial losses.

The Solution: Trading platforms use write-through:

  • Price update → Write to both cache and database simultaneously
  • Cache always has latest price
  • No stale data risk

Example: Stock price changes from $100 to $105:

  • Write-through: Cache = $105, Database = $105 (immediately consistent)
  • User sees correct price instantly
  • No risk of showing stale $100 price

Why Write-Through? Financial data requires strong consistency. A 1-second delay showing wrong price could mean millions in losses. Write-through ensures cache and database always match.

Impact: Zero stale data incidents. Price updates visible instantly. Critical for high-frequency trading.

Write-Invalidate: E-commerce Product Updates

Section titled “Write-Invalidate: E-commerce Product Updates”

The Challenge: E-commerce sites update product prices, inventory, descriptions frequently. Users need fresh data but write performance matters.

The Solution: E-commerce platforms use write-invalidate:

  • Product update → Write to database, invalidate cache
  • Next read → Cache miss, fetch fresh data, store in cache
  • Balance between consistency and performance

Example: Admin updates product price from $50 to $45:

  • Write to database: $45
  • Invalidate cache key: product:123
  • Next user request: Cache miss → Fetch $45 from database → Cache $45
  • User sees correct price

Why Write-Invalidate? Simpler than write-through (no cache write on update). Ensures fresh data on next read. Good balance for e-commerce.

Impact: Product updates visible within seconds. Write latency: 50ms (database only) vs 100ms (write-through). Reduced stale data by 95%.

The Challenge: News articles change frequently. Some articles are updated (breaking news), others are static (archived articles).

The Solution: News websites use TTL expiration:

  • Breaking news: Short TTL (5 minutes) - changes frequently
  • Regular articles: Medium TTL (1 hour) - changes occasionally
  • Archived articles: Long TTL (24 hours) - rarely changes

Example: Breaking news article published:

  • Cached for 5 minutes
  • If updated within 5 minutes → Stale data shown (acceptable for news)
  • After 5 minutes → Expires, fresh data fetched

Why TTL? News has natural expiration. A 5-minute-old article is less relevant than a 1-minute-old article. TTL ensures freshness without complex invalidation logic.

Impact: 80% of requests served from cache. Breaking news updates visible within 5 minutes. Reduced database load by 90%.

Event-Driven Invalidation: Social Media Platforms

Section titled “Event-Driven Invalidation: Social Media Platforms”

The Challenge: Social media platforms have complex data relationships. Updating a user’s profile affects their posts, comments, followers’ feeds.

The Solution: Social media platforms use event-driven invalidation:

  • User updates profile → Publish event
  • Event triggers invalidation of related caches:
    • User profile cache
    • User’s posts cache
    • Followers’ feed cache (contains user’s posts)

Example: User changes profile picture:

  • Update database: New picture URL
  • Publish event: user.profile.updated
  • Event handlers invalidate:
    • user:123:profile cache
    • user:123:posts cache (posts show profile picture)
    • feed:456 cache (follower’s feed contains user’s posts)

Why Event-Driven? Complex relationships require precise invalidation. A profile update affects multiple caches. Events ensure all related caches are invalidated.

Impact: Profile updates visible across all caches within seconds. Reduced stale data by 99%. Handles complex invalidation logic efficiently.

Cache Stampede Prevention: GitHub’s Repository Cache

Section titled “Cache Stampede Prevention: GitHub’s Repository Cache”

The Challenge: When a popular repository’s cache expires, thousands of requests hit the database simultaneously (cache stampede).

The Solution: GitHub uses cache stampede prevention:

  • Probabilistic early expiration: Refresh cache before expiration (e.g., at 80% of TTL)
  • Lock-based refresh: Only one request refreshes, others wait
  • Stale-while-revalidate: Serve stale data while refreshing

Example: Popular repository cache expires:

  • Request 1: Cache expired → Acquires lock → Refreshes from database
  • Requests 2-1000: Cache expired → Wait for lock → Get fresh data from Request 1
  • Result: Only 1 database query instead of 1000

Why Prevent Stampede? Cache stampedes can overwhelm databases. During viral events, thousands of requests hitting database simultaneously can cause outages.

Impact: Database queries reduced by 99% during cache expiration. No cache stampede incidents. Handles viral events gracefully.

The Challenge: Netflix has different content types with different update frequencies and consistency requirements.

The Solution: Netflix uses hybrid invalidation:

  • Write-through: User watch history (critical, must be consistent)
  • Write-invalidate: Video metadata (updates occasionally, freshness important)
  • TTL: Recommendations (changes frequently, approximate is fine)
  • Event-driven: User preferences (affects multiple caches)

Why Hybrid? Different data has different requirements:

  • Watch history: Must be accurate → Write-through
  • Metadata: Should be fresh → Write-invalidate
  • Recommendations: Approximate is fine → TTL
  • Preferences: Affects multiple caches → Event-driven

Impact: Right strategy for each data type. Optimized consistency and performance. Reduced stale data incidents by 95%.



StrategyConsistencyComplexityLatencyUse Case
Write-ThroughStrongLowHigh (write)Critical data
Write-InvalidateEventualLowLow (write)General purpose
TTLEventualLowLowRead-heavy
Event-DrivenStrongHighLowComplex systems

🔥 Hardest Problem

Cache invalidation is famously difficult. Choose strategy based on consistency needs.

⚡ Write-Invalidate

Most common: delete cache on write. Next read fetches fresh. Simple and effective.

🐘 Prevent Stampede

Use distributed locks or probabilistic refresh to prevent cache stampede.

🔄 Event-Driven

For complex systems, use event-driven invalidation for precise control.