When Caching Makes Things Worse

Context

Caching is the first tool most engineers reach for when something is slow. And most of the time, it works. But I have seen enough cases where adding a cache made the system worse, not just in complexity but in correctness, reliability, and sometimes even performance, that I now treat caching as a design decision that requires justification, not a default optimization.

Case 1: The Cache That Hid a Bug

A product catalog service was slow because it ran an unindexed query for every request. Someone added a Redis cache in front of it. Response times dropped from 800ms to 5ms. Problem solved.

Six months later, the underlying database was migrated. The unindexed query became even slower (3 seconds). But nobody noticed because the cache was absorbing 99.8% of traffic. The 0.2% of cache misses caused timeouts that cascaded to upstream services. The cache had masked the real problem so effectively that the team lost awareness of it.

Lesson: A cache that hides a performance bug delays the fix and increases the eventual blast radius when the cache fails.

Case 2: The Thundering Herd

A leaderboard service cached the top 100 rankings with a 60-second TTL. At 60-second intervals, the cache expired and every concurrent request hit the database simultaneously. The database could handle steady-state load fine but could not handle 500 simultaneous queries for the same data.

The team's fix was to reduce the TTL to 30 seconds. This doubled the frequency of thundering herds. The actual fix was cache stampede protection: letting one request refresh the cache while serving stale data to all others.

Cache Pattern	Thundering Herd Risk	Complexity
Simple TTL expiration	High	Low
Stale-while-revalidate	Low	Medium
Lock-based refresh (single flight)	None	Medium
Probabilistic early expiration	Low	Medium
Background refresh (async)	None	High

Case 3: The Consistency Nightmare

An e-commerce system cached product prices in a CDN, in an application-level cache, and in a client-side cache. When a price changed, it took up to 15 minutes for all cache layers to reflect the new price. During that window, a user could see three different prices depending on which page they were on.

The team added cache invalidation logic. This introduced a new class of bugs: invalidation messages that arrived out of order, that were dropped, or that invalidated the wrong cache region. The invalidation system became more complex than the original pricing system.

Lesson: Multi-layer caching with independent TTLs creates consistency windows that grow with the number of layers. Each layer you add multiplies the number of states the system can be in.

Case 4: The Cache That Was Slower

A service added a cache to reduce database load. The cache hit rate was 15% because the access pattern was highly random with a long tail of unique keys. For the 85% of requests that missed the cache, the system now did two lookups (cache miss, then database) instead of one. Average latency increased.

Caching only improves performance when the access pattern exhibits temporal locality, meaning the same data is requested multiple times within the cache's TTL window. Without locality, a cache adds latency and complexity with no benefit.

Rule of thumb: If your expected cache hit rate is below 50%, the cache is probably not worth it. If it is below 30%, it is almost certainly making things worse.

Case 5: The Memory Pressure

A Java service added an in-process cache using a ConcurrentHashMap. The cache grew without bound because nobody implemented eviction. Over a few hours, the cache consumed most of the heap. GC pauses increased from milliseconds to seconds. The service became unresponsive.

The team added an LRU eviction policy. The cache now evicted aggressively, but the eviction processing itself consumed CPU during high-load periods. The cache was competing with the actual workload for resources.

Lesson: In-process caches share resources with the application. Memory for the cache is memory not available for the workload. CPU for cache management is CPU not available for request processing.

My Caching Decision Framework

Before adding a cache, I ask:

What is the actual bottleneck? Profile first. The slow thing might not be what you think it is.
Can the bottleneck be fixed directly? Add an index, optimize a query, reduce payload size. These are permanent fixes. A cache is a workaround.
Does the access pattern exhibit locality? If the same data is not requested repeatedly, a cache will not help.
What is the consistency requirement? If users must see up-to-date data, caching introduces risk.
What is the failure mode? When the cache goes down, can the system handle the full load on the backing store?
What is the operational cost? A Redis cluster requires monitoring, memory management, failover testing, and capacity planning. This is not free.

When Caching Is the Right Answer

Caching is genuinely valuable when:

The access pattern has high temporal locality (top-N items, configuration, session data)
The data changes infrequently relative to how often it is read
The consistency requirement allows staleness (content that is minutes old is acceptable)
The backing store cannot be scaled further or the cost of scaling it exceeds the cost of a cache
The cache failure mode is acceptable (the system degrades gracefully, not catastrophically)

Key Takeaways

Caching is a design decision, not a default optimization. It requires justification.
A cache that hides a performance bug delays the fix and increases eventual blast radius.
Thundering herds are a common failure mode. Use stale-while-revalidate or single-flight refresh patterns.
Multi-layer caching multiplies consistency states. Each layer increases the window of potential inconsistency.
Low cache hit rates mean the cache is adding latency, not removing it.
In-process caches compete with the application for memory and CPU.
Always ask: can the underlying bottleneck be fixed directly?

Final Thoughts

The best cache is the one you did not need to add. A well-indexed database query that runs in 5ms does not need a cache. A well-designed API that returns only the fields the client needs does not need a cache. Caching is a powerful tool, but it is also a source of complexity, inconsistency, and operational burden. Reach for it after you have exhausted simpler solutions, not before.

When Caching Makes Things Worse

Context

Case 1: The Cache That Hid a Bug

Case 2: The Thundering Herd

Case 3: The Consistency Nightmare

Case 4: The Cache That Was Slower

Case 5: The Memory Pressure

My Caching Decision Framework

When Caching Is the Right Answer

Key Takeaways

Further Reading

Final Thoughts

Recommended

Designing an Offline-First Sync Engine for Mobile Apps

Jetpack Compose Recomposition: A Deep Dive

Event Tracking System Design for Android Applications