🔥 Hardest Problem
Cache invalidation is famously difficult. Choose strategy based on consistency needs.
“There are only two hard things in Computer Science: cache invalidation and naming things.”
— Phil Karlton
Cache invalidation is deciding when and how to remove or update cached data when the underlying data changes.
Why is it hard?
Stale data = cached data that doesn’t match the database anymore.
Impact:
Update cache when data is written. Cache and database stay in sync.
How it works:
When to use:
Trade-offs:
class WriteThroughCache: def __init__(self, cache: CacheClient, db: Database): self.cache = cache self.db = db
def update_user(self, user_id: int, data: dict): # Write to cache cache_key = f"user:{user_id}" self.cache.set(cache_key, data, ttl=3600)
# Write to database self.db.update_user(user_id, data)
# Both complete - return success return Trueclass WriteThroughCache { private final CacheClient cache; private final Database db;
public WriteThroughCache(CacheClient cache, Database db) { this.cache = cache; this.db = db; }
public boolean updateUser(int userId, UserData data) { // Write to cache String cacheKey = "user:" + userId; cache.set(cacheKey, serialize(data), 3600);
// Write to database db.updateUser(userId, data);
// Both complete - return success return true; }}Delete cache on write. Next read fetches fresh data.
How it works:
When to use:
Trade-offs:
class WriteInvalidateCache: def __init__(self, cache: CacheClient, db: Database): self.cache = cache self.db = db
def update_user(self, user_id: int, data: dict): # Update database self.db.update_user(user_id, data)
# Invalidate cache cache_key = f"user:{user_id}" self.cache.delete(cache_key)
return True
def get_user(self, user_id: int): # Cache-aside pattern cache_key = f"user:{user_id}" cached = self.cache.get(cache_key)
if cached: return cached
# Cache miss - fetch from DB user = self.db.get_user(user_id)
# Cache it if user: self.cache.set(cache_key, user, ttl=3600)
return userclass WriteInvalidateCache { private final CacheClient cache; private final Database db;
public WriteInvalidateCache(CacheClient cache, Database db) { this.cache = cache; this.db = db; }
public boolean updateUser(int userId, UserData data) { // Update database db.updateUser(userId, data);
// Invalidate cache String cacheKey = "user:" + userId; cache.delete(cacheKey);
return true; }
public Optional<User> getUser(int userId) { // Cache-aside pattern String cacheKey = "user:" + userId; Optional<String> cached = cache.get(cacheKey);
if (cached.isPresent()) { return Optional.of(deserialize(cached.get())); }
// Cache miss - fetch from DB Optional<User> user = db.getUser(userId);
// Cache it user.ifPresent(u -> cache.set(cacheKey, serialize(u), 3600));
return user; }}Time-based expiration. Items expire after fixed time.
How it works:
When to use:
Trade-offs:
Invalidate on data change events. Most sophisticated approach.
How it works:
When to use:
Trade-offs:
Cache stampede (thundering herd) happens when cache expires and many requests simultaneously try to fetch from database.
Impact:
Only one request fetches; others wait.
How it works:
import redisimport timeimport uuid
class CacheWithLock: def __init__(self, redis_client: redis.Redis): self.redis = redis_client self.lock_ttl = 10 # seconds
def get_with_lock(self, key: str, fetch_func): # Try cache first cached = self.redis.get(key) if cached: return cached
# Cache miss - try to acquire lock lock_key = f"lock:{key}" lock_value = str(uuid.uuid4())
# Try to acquire lock (set if not exists) acquired = self.redis.set( lock_key, lock_value, nx=True, ex=self.lock_ttl )
if acquired: # We got the lock - fetch from DB try: data = fetch_func() self.redis.set(key, data, ex=300) return data finally: # Release lock (only if we still own it) if self.redis.get(lock_key) == lock_value: self.redis.delete(lock_key) else: # Someone else has lock - wait and retry time.sleep(0.1) # Retry cache (winner should have cached it) return self.redis.get(key) or fetch_func()import redis.clients.jedis.Jedis;import java.util.UUID;
class CacheWithLock { private final Jedis redis; private final int lockTtl = 10;
public CacheWithLock(Jedis redis) { this.redis = redis; }
public String getWithLock(String key, Supplier<String> fetchFunc) { // Try cache first String cached = redis.get(key); if (cached != null) { return cached; }
// Cache miss - try to acquire lock String lockKey = "lock:" + key; String lockValue = UUID.randomUUID().toString();
// Try to acquire lock (set if not exists) boolean acquired = "OK".equals( redis.set(lockKey, lockValue, "NX", "EX", lockTtl) );
if (acquired) { // We got the lock - fetch from DB try { String data = fetchFunc.get(); redis.setex(key, 300, data); return data; } finally { // Release lock (only if we still own it) if (lockValue.equals(redis.get(lockKey))) { redis.del(lockKey); } } } else { // Someone else has lock - wait and retry try { Thread.sleep(100); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } // Retry cache (winner should have cached it) String retry = redis.get(key); return retry != null ? retry : fetchFunc.get(); } }}Refresh cache randomly before expiration. Spreads load over time.
How it works:
import randomimport time
class ProbabilisticRefreshCache: def __init__(self, cache: CacheClient, db: Database): self.cache = cache self.db = db
def get_with_refresh(self, key: str, ttl: int = 300): cached_data = self.cache.get(key)
if cached_data: # Check if we should refresh (last 10% of TTL) # This is simplified - in practice, store timestamp if random.random() < 0.1: # 10% chance # Refresh in background self._refresh_async(key, ttl)
return cached_data
# Cache miss - fetch and cache data = self.db.fetch(key) self.cache.set(key, data, ttl=ttl) return data
def _refresh_async(self, key: str, ttl: int): # Background refresh (simplified) data = self.db.fetch(key) self.cache.set(key, data, ttl=ttl)import java.util.Random;
class ProbabilisticRefreshCache { private final CacheClient cache; private final Database db; private final Random random = new Random();
public ProbabilisticRefreshCache(CacheClient cache, Database db) { this.cache = cache; this.db = db; }
public String getWithRefresh(String key, int ttl) { String cached = cache.get(key);
if (cached != null) { // Check if we should refresh (last 10% of TTL) if (random.nextDouble() < 0.1) { // 10% chance // Refresh in background refreshAsync(key, ttl); }
return cached; }
// Cache miss - fetch and cache String data = db.fetch(key); cache.set(key, data, ttl); return data; }
private void refreshAsync(String key, int ttl) { // Background refresh (simplified) String data = db.fetch(key); cache.set(key, data, ttl); }}Cache invalidation strategies vary by company based on their consistency requirements:
The Challenge: Trading platforms need real-time, accurate prices. Stale prices mean wrong trades and financial losses.
The Solution: Trading platforms use write-through:
Example: Stock price changes from $100 to $105:
Why Write-Through? Financial data requires strong consistency. A 1-second delay showing wrong price could mean millions in losses. Write-through ensures cache and database always match.
Impact: Zero stale data incidents. Price updates visible instantly. Critical for high-frequency trading.
The Challenge: E-commerce sites update product prices, inventory, descriptions frequently. Users need fresh data but write performance matters.
The Solution: E-commerce platforms use write-invalidate:
Example: Admin updates product price from $50 to $45:
product:123Why Write-Invalidate? Simpler than write-through (no cache write on update). Ensures fresh data on next read. Good balance for e-commerce.
Impact: Product updates visible within seconds. Write latency: 50ms (database only) vs 100ms (write-through). Reduced stale data by 95%.
The Challenge: News articles change frequently. Some articles are updated (breaking news), others are static (archived articles).
The Solution: News websites use TTL expiration:
Example: Breaking news article published:
Why TTL? News has natural expiration. A 5-minute-old article is less relevant than a 1-minute-old article. TTL ensures freshness without complex invalidation logic.
Impact: 80% of requests served from cache. Breaking news updates visible within 5 minutes. Reduced database load by 90%.
The Challenge: Social media platforms have complex data relationships. Updating a user’s profile affects their posts, comments, followers’ feeds.
The Solution: Social media platforms use event-driven invalidation:
Example: User changes profile picture:
user.profile.updateduser:123:profile cacheuser:123:posts cache (posts show profile picture)feed:456 cache (follower’s feed contains user’s posts)Why Event-Driven? Complex relationships require precise invalidation. A profile update affects multiple caches. Events ensure all related caches are invalidated.
Impact: Profile updates visible across all caches within seconds. Reduced stale data by 99%. Handles complex invalidation logic efficiently.
The Challenge: When a popular repository’s cache expires, thousands of requests hit the database simultaneously (cache stampede).
The Solution: GitHub uses cache stampede prevention:
Example: Popular repository cache expires:
Why Prevent Stampede? Cache stampedes can overwhelm databases. During viral events, thousands of requests hitting database simultaneously can cause outages.
Impact: Database queries reduced by 99% during cache expiration. No cache stampede incidents. Handles viral events gracefully.
The Challenge: Netflix has different content types with different update frequencies and consistency requirements.
The Solution: Netflix uses hybrid invalidation:
Why Hybrid? Different data has different requirements:
Impact: Right strategy for each data type. Optimized consistency and performance. Reduced stale data incidents by 95%.
| Strategy | Consistency | Complexity | Latency | Use Case |
|---|---|---|---|---|
| Write-Through | Strong | Low | High (write) | Critical data |
| Write-Invalidate | Eventual | Low | Low (write) | General purpose |
| TTL | Eventual | Low | Low | Read-heavy |
| Event-Driven | Strong | High | Low | Complex systems |
🔥 Hardest Problem
Cache invalidation is famously difficult. Choose strategy based on consistency needs.
⚡ Write-Invalidate
Most common: delete cache on write. Next read fetches fresh. Simple and effective.
🐘 Prevent Stampede
Use distributed locks or probabilistic refresh to prevent cache stampede.
🔄 Event-Driven
For complex systems, use event-driven invalidation for precise control.