Cache Invalidation

The hardest problem in computer science

The Hardest Problem

“There are only two hard things in Computer Science: cache invalidation and naming things.”
— Phil Karlton

Cache invalidation is deciding when and how to remove or update cached data when the underlying data changes.

Why is it hard?

When do you invalidate? (on write, on time, on events?)
What do you invalidate? (single key, related keys, entire cache?)
How do you invalidate? (delete, update, version?)
What if invalidation fails?

The Stale Data Problem

Stale data = cached data that doesn’t match the database anymore.

Impact:

Users see wrong prices
Inventory counts are incorrect
User profiles show old data
Search results are outdated

Strategy 1: Write-Through

Update cache when data is written. Cache and database stay in sync.

How it works:

Application writes data
Update cache immediately
Update database simultaneously
Wait for both to complete
Return success

When to use:

Strong consistency required
Can’t afford stale data
Write latency acceptable
Critical data (prices, inventory)

Trade-offs:

Cache always has latest data
No stale data
Simple to understand
Higher write latency (waits for both)
Database becomes bottleneck

1
class WriteThroughCache:
2
    def __init__(self, cache: CacheClient, db: Database):
3
        self.cache = cache
4
        self.db = db
5

6
    def update_user(self, user_id: int, data: dict):
7
        # Write to cache
8
        cache_key = f"user:{user_id}"
9
        self.cache.set(cache_key, data, ttl=3600)
10

11
        # Write to database
12
        self.db.update_user(user_id, data)
13

14
        # Both complete - return success
15
        return True

1
class WriteThroughCache {
2
    private final CacheClient cache;
3
    private final Database db;
4

5
    public WriteThroughCache(CacheClient cache, Database db) {
6
        this.cache = cache;
7
        this.db = db;
8
    }
9

10
    public boolean updateUser(int userId, UserData data) {
11
        // Write to cache
12
        String cacheKey = "user:" + userId;
13
        cache.set(cacheKey, serialize(data), 3600);
14

15
        // Write to database
16
        db.updateUser(userId, data);
17

18
        // Both complete - return success
19
        return true;
20
    }
21
}

Strategy 2: Write-Invalidate

Delete cache on write. Next read fetches fresh data.

How it works:

Application writes data
Delete from cache (invalidate)
Update database
Return success
Next read fetches fresh data from DB and caches it

When to use:

Simpler than write-through
Lower write latency (no cache write)
Good for write-heavy workloads
When cache miss is acceptable

Trade-offs:

Lower write latency
Simple to implement
Ensures fresh data on next read
Next read has cache miss (slower)
May cause cache stampede

Implementation

Python
Java

1
class WriteInvalidateCache:
2
    def __init__(self, cache: CacheClient, db: Database):
3
        self.cache = cache
4
        self.db = db
5

6
    def update_user(self, user_id: int, data: dict):
7
        # Update database
8
        self.db.update_user(user_id, data)
9

10
        # Invalidate cache
11
        cache_key = f"user:{user_id}"
12
        self.cache.delete(cache_key)
13

14
        return True
15

16
    def get_user(self, user_id: int):
17
        # Cache-aside pattern
18
        cache_key = f"user:{user_id}"
19
        cached = self.cache.get(cache_key)
20

21
        if cached:
22
            return cached
23

24
        # Cache miss - fetch from DB
25
        user = self.db.get_user(user_id)
26

27
        # Cache it
28
        if user:
29
            self.cache.set(cache_key, user, ttl=3600)
30

31
        return user

1
class WriteInvalidateCache {
2
    private final CacheClient cache;
3
    private final Database db;
4

5
    public WriteInvalidateCache(CacheClient cache, Database db) {
6
        this.cache = cache;
7
        this.db = db;
8
    }
9

10
    public boolean updateUser(int userId, UserData data) {
11
        // Update database
12
        db.updateUser(userId, data);
13

14
        // Invalidate cache
15
        String cacheKey = "user:" + userId;
16
        cache.delete(cacheKey);
17

18
        return true;
19
    }
20

21
    public Optional<User> getUser(int userId) {
22
        // Cache-aside pattern
23
        String cacheKey = "user:" + userId;
24
        Optional<String> cached = cache.get(cacheKey);
25

26
        if (cached.isPresent()) {
27
            return Optional.of(deserialize(cached.get()));
28
        }
29

30
        // Cache miss - fetch from DB
31
        Optional<User> user = db.getUser(userId);
32

33
        // Cache it
34
        user.ifPresent(u -> cache.set(cacheKey, serialize(u), 3600));
35

36
        return user;
37
    }
38
}

Strategy 3: TTL Expiration

Time-based expiration. Items expire after fixed time.

How it works:

Set TTL when caching data
Background process checks expiration
Expired items removed automatically
Next read fetches fresh data

When to use:

Data changes infrequently
Some staleness acceptable
Simple to implement
Good for read-heavy workloads

Trade-offs:

Simple
Automatic cleanup
No manual invalidation needed
May serve stale data until expiration
Doesn’t react to actual changes

Strategy 4: Event-Driven Invalidation

Invalidate on data change events. Most sophisticated approach.

How it works:

Data updated in database
Publish event (e.g., “user:123 updated”)
Cache listens to events
On relevant event, invalidate cache
Can invalidate related keys too

When to use:

Complex invalidation logic
Need to invalidate related data
Event-driven architecture
Multiple cache layers

Trade-offs:

Precise invalidation
Can handle complex relationships
Reactive to actual changes
More complex to implement
Requires event infrastructure

The Cache Stampede Problem

Cache stampede (thundering herd) happens when cache expires and many requests simultaneously try to fetch from database.

Impact:

Database overloaded with duplicate queries
Slow response times
Potential database crash
Wasted resources

Solution 1: Distributed Locks

Only one request fetches; others wait.

How it works:

Request checks cache (miss)
Try to acquire distributed lock
Only one request gets lock
Winner fetches from database and caches
Others wait, then read from cache

Python
Java

1
import redis
2
import time
3
import uuid
4

5
class CacheWithLock:
6
    def __init__(self, redis_client: redis.Redis):
7
        self.redis = redis_client
8
        self.lock_ttl = 10  # seconds
9

10
    def get_with_lock(self, key: str, fetch_func):
11
        # Try cache first
12
        cached = self.redis.get(key)
13
        if cached:
14
            return cached
15

16
        # Cache miss - try to acquire lock
17
        lock_key = f"lock:{key}"
18
        lock_value = str(uuid.uuid4())
19

20
        # Try to acquire lock (set if not exists)
21
        acquired = self.redis.set(
22
            lock_key, lock_value,
23
            nx=True, ex=self.lock_ttl
24
        )
25

26
        if acquired:
27
            # We got the lock - fetch from DB
28
            try:
29
                data = fetch_func()
30
                self.redis.set(key, data, ex=300)
31
                return data
32
            finally:
33
                # Release lock (only if we still own it)
34
                if self.redis.get(lock_key) == lock_value:
35
                    self.redis.delete(lock_key)
36
        else:
37
            # Someone else has lock - wait and retry
38
            time.sleep(0.1)
39
            # Retry cache (winner should have cached it)
40
            return self.redis.get(key) or fetch_func()

1
import redis.clients.jedis.Jedis;
2
import java.util.UUID;
3

4
class CacheWithLock {
5
    private final Jedis redis;
6
    private final int lockTtl = 10;
7

8
    public CacheWithLock(Jedis redis) {
9
        this.redis = redis;
10
    }
11

12
    public String getWithLock(String key, Supplier<String> fetchFunc) {
13
        // Try cache first
14
        String cached = redis.get(key);
15
        if (cached != null) {
16
            return cached;
17
        }
18

19
        // Cache miss - try to acquire lock
20
        String lockKey = "lock:" + key;
21
        String lockValue = UUID.randomUUID().toString();
22

23
        // Try to acquire lock (set if not exists)
24
        boolean acquired = "OK".equals(
25
            redis.set(lockKey, lockValue, "NX", "EX", lockTtl)
26
        );
27

28
        if (acquired) {
29
            // We got the lock - fetch from DB
30
            try {
31
                String data = fetchFunc.get();
32
                redis.setex(key, 300, data);
33
                return data;
34
            } finally {
35
                // Release lock (only if we still own it)
36
                if (lockValue.equals(redis.get(lockKey))) {
37
                    redis.del(lockKey);
38
                }
39
            }
40
        } else {
41
            // Someone else has lock - wait and retry
42
            try {
43
                Thread.sleep(100);
44
            } catch (InterruptedException e) {
45
                Thread.currentThread().interrupt();
46
            }
47
            // Retry cache (winner should have cached it)
48
            String retry = redis.get(key);
49
            return retry != null ? retry : fetchFunc.get();
50
        }
51
    }
52
}

Solution 2: Probabilistic Early Expiration

Refresh cache randomly before expiration. Spreads load over time.

How it works:

Set TTL (e.g., 300 seconds)
In last 10% of TTL (270-300s), randomly refresh
First request in window refreshes cache
Others use refreshed cache
Spreads refresh load over time

Python
Java

1
import random
2
import time
3

4
class ProbabilisticRefreshCache:
5
    def __init__(self, cache: CacheClient, db: Database):
6
        self.cache = cache
7
        self.db = db
8

9
    def get_with_refresh(self, key: str, ttl: int = 300):
10
        cached_data = self.cache.get(key)
11

12
        if cached_data:
13
            # Check if we should refresh (last 10% of TTL)
14
            # This is simplified - in practice, store timestamp
15
            if random.random() < 0.1:  # 10% chance
16
                # Refresh in background
17
                self._refresh_async(key, ttl)
18

19
            return cached_data
20

21
        # Cache miss - fetch and cache
22
        data = self.db.fetch(key)
23
        self.cache.set(key, data, ttl=ttl)
24
        return data
25

26
    def _refresh_async(self, key: str, ttl: int):
27
        # Background refresh (simplified)
28
        data = self.db.fetch(key)
29
        self.cache.set(key, data, ttl=ttl)

1
import java.util.Random;
2

3
class ProbabilisticRefreshCache {
4
    private final CacheClient cache;
5
    private final Database db;
6
    private final Random random = new Random();
7

8
    public ProbabilisticRefreshCache(CacheClient cache, Database db) {
9
        this.cache = cache;
10
        this.db = db;
11
    }
12

13
    public String getWithRefresh(String key, int ttl) {
14
        String cached = cache.get(key);
15

16
        if (cached != null) {
17
            // Check if we should refresh (last 10% of TTL)
18
            if (random.nextDouble() < 0.1) {  // 10% chance
19
                // Refresh in background
20
                refreshAsync(key, ttl);
21
            }
22

23
            return cached;
24
        }
25

26
        // Cache miss - fetch and cache
27
        String data = db.fetch(key);
28
        cache.set(key, data, ttl);
29
        return data;
30
    }
31

32
    private void refreshAsync(String key, int ttl) {
33
        // Background refresh (simplified)
34
        String data = db.fetch(key);
35
        cache.set(key, data, ttl);
36
    }
37
}

Real-World Examples

Cache invalidation strategies vary by company based on their consistency requirements:

Write-Through: Financial Trading Platforms

The Challenge: Trading platforms need real-time, accurate prices. Stale prices mean wrong trades and financial losses.

The Solution: Trading platforms use write-through:

Price update → Write to both cache and database simultaneously
Cache always has latest price
No stale data risk

Example: Stock price changes from $100 to $105:

Write-through: Cache = $105, Database = $105 (immediately consistent)
User sees correct price instantly
No risk of showing stale $100 price

Why Write-Through? Financial data requires strong consistency. A 1-second delay showing wrong price could mean millions in losses. Write-through ensures cache and database always match.

Impact: Zero stale data incidents. Price updates visible instantly. Critical for high-frequency trading.

Write-Invalidate: E-commerce Product Updates

The Challenge: E-commerce sites update product prices, inventory, descriptions frequently. Users need fresh data but write performance matters.

The Solution: E-commerce platforms use write-invalidate:

Product update → Write to database, invalidate cache
Next read → Cache miss, fetch fresh data, store in cache
Balance between consistency and performance

Example: Admin updates product price from $50 to $45:

Write to database: $45
Invalidate cache key: product:123
Next user request: Cache miss → Fetch $45 from database → Cache $45
User sees correct price

Why Write-Invalidate? Simpler than write-through (no cache write on update). Ensures fresh data on next read. Good balance for e-commerce.

Impact: Product updates visible within seconds. Write latency: 50ms (database only) vs 100ms (write-through). Reduced stale data by 95%.

TTL Expiration: News Websites

The Challenge: News articles change frequently. Some articles are updated (breaking news), others are static (archived articles).

The Solution: News websites use TTL expiration:

Breaking news: Short TTL (5 minutes) - changes frequently
Regular articles: Medium TTL (1 hour) - changes occasionally
Archived articles: Long TTL (24 hours) - rarely changes

Example: Breaking news article published:

Cached for 5 minutes
If updated within 5 minutes → Stale data shown (acceptable for news)
After 5 minutes → Expires, fresh data fetched

Why TTL? News has natural expiration. A 5-minute-old article is less relevant than a 1-minute-old article. TTL ensures freshness without complex invalidation logic.

Impact: 80% of requests served from cache. Breaking news updates visible within 5 minutes. Reduced database load by 90%.

The Challenge: Social media platforms have complex data relationships. Updating a user’s profile affects their posts, comments, followers’ feeds.

The Solution: Social media platforms use event-driven invalidation:

User updates profile → Publish event
Event triggers invalidation of related caches:
- User profile cache
- User’s posts cache
- Followers’ feed cache (contains user’s posts)

Example: User changes profile picture:

Update database: New picture URL
Publish event: user.profile.updated
Event handlers invalidate:
- user:123:profile cache
- user:123:posts cache (posts show profile picture)
- feed:456 cache (follower’s feed contains user’s posts)

Why Event-Driven? Complex relationships require precise invalidation. A profile update affects multiple caches. Events ensure all related caches are invalidated.

Impact: Profile updates visible across all caches within seconds. Reduced stale data by 99%. Handles complex invalidation logic efficiently.

Cache Stampede Prevention: GitHub’s Repository Cache

The Challenge: When a popular repository’s cache expires, thousands of requests hit the database simultaneously (cache stampede).

The Solution: GitHub uses cache stampede prevention:

Probabilistic early expiration: Refresh cache before expiration (e.g., at 80% of TTL)
Lock-based refresh: Only one request refreshes, others wait
Stale-while-revalidate: Serve stale data while refreshing

Example: Popular repository cache expires:

Request 1: Cache expired → Acquires lock → Refreshes from database
Requests 2-1000: Cache expired → Wait for lock → Get fresh data from Request 1
Result: Only 1 database query instead of 1000

Why Prevent Stampede? Cache stampedes can overwhelm databases. During viral events, thousands of requests hitting database simultaneously can cause outages.

Impact: Database queries reduced by 99% during cache expiration. No cache stampede incidents. Handles viral events gracefully.

Netflix: Hybrid Invalidation Strategy

The Challenge: Netflix has different content types with different update frequencies and consistency requirements.

The Solution: Netflix uses hybrid invalidation:

Write-through: User watch history (critical, must be consistent)
Write-invalidate: Video metadata (updates occasionally, freshness important)
TTL: Recommendations (changes frequently, approximate is fine)
Event-driven: User preferences (affects multiple caches)

Why Hybrid? Different data has different requirements:

Watch history: Must be accurate → Write-through
Metadata: Should be fresh → Write-invalidate
Recommendations: Approximate is fine → TTL
Preferences: Affects multiple caches → Event-driven

Impact: Right strategy for each data type. Optimized consistency and performance. Reduced stale data incidents by 95%.

Invalidation Strategies Comparison

Strategy	Consistency	Complexity	Latency	Use Case
Write-Through	Strong	Low	High (write)	Critical data
Write-Invalidate	Eventual	Low	Low (write)	General purpose
TTL	Eventual	Low	Low	Read-heavy
Event-Driven	Strong	High	Low	Complex systems

Key Takeaways

🔥 Hardest Problem

Cache invalidation is famously difficult. Choose strategy based on consistency needs.

⚡ Write-Invalidate

Most common: delete cache on write. Next read fetches fresh. Simple and effective.

🐘 Prevent Stampede

Use distributed locks or probabilistic refresh to prevent cache stampede.

🔄 Event-Driven

For complex systems, use event-driven invalidation for precise control.

Next Steps

Understand CDN & Edge Caching - caching at the edge for global users
Review Caching Fundamentals - the four fundamental patterns

Request a feature or report an issue

Cache Invalidation

The Hardest Problem

The Stale Data Problem

Strategy 1: Write-Through

Implementation

Strategy 2: Write-Invalidate

Implementation

Strategy 3: TTL Expiration

Strategy 4: Event-Driven Invalidation

The Cache Stampede Problem

Solution 1: Distributed Locks

Solution 2: Probabilistic Early Expiration

Real-World Examples

Write-Through: Financial Trading Platforms

Write-Invalidate: E-commerce Product Updates

TTL Expiration: News Websites

Event-Driven Invalidation: Social Media Platforms

Cache Stampede Prevention: GitHub’s Repository Cache

Netflix: Hybrid Invalidation Strategy

Invalidation Strategies Comparison

Key Takeaways

Next Steps