When Caching Makes Things Worse
Real scenarios where adding a cache increased complexity, introduced bugs, or degraded performance, and the decision framework I use to evaluate whether a cache is the right solution.
Context
Caching is the first tool most engineers reach for when something is slow. And most of the time, it works. But I have seen enough cases where adding a cache made the system worse, not just in complexity but in correctness, reliability, and sometimes even performance, that I now treat caching as a design decision that requires justification, not a default optimization.
See also: How I'd Design a Mobile Configuration System at Scale.
Case 1: The Cache That Hid a Bug
A product catalog service was slow because it ran an unindexed query for every request. Someone added a Redis cache in front of it. Response times dropped from 800ms to 5ms. Problem solved.
Six months later, the underlying database was migrated. The unindexed query became even slower (3 seconds). But nobody noticed because the cache was absorbing 99.8% of traffic. The 0.2% of cache misses caused timeouts that cascaded to upstream services. The cache had masked the real problem so effectively that the team lost awareness of it.
Related: Designing Event Schemas That Survive Product Changes.
Lesson: A cache that hides a performance bug delays the fix and increases the eventual blast radius when the cache fails.
Case 2: The Thundering Herd
A leaderboard service cached the top 100 rankings with a 60-second TTL. At 60-second intervals, the cache expired and every concurrent request hit the database simultaneously. The database could handle steady-state load fine but could not handle 500 simultaneous queries for the same data.
The team's fix was to reduce the TTL to 30 seconds. This doubled the frequency of thundering herds. The actual fix was cache stampede protection: letting one request refresh the cache while serving stale data to all others.
| Cache Pattern | Thundering Herd Risk | Complexity |
|---|---|---|
| Simple TTL expiration | High | Low |
| Stale-while-revalidate | Low | Medium |
| Lock-based refresh (single flight) | None | Medium |
| Probabilistic early expiration | Low | Medium |
| Background refresh (async) | None | High |
Case 3: The Consistency Nightmare
An e-commerce system cached product prices in a CDN, in an application-level cache, and in a client-side cache. When a price changed, it took up to 15 minutes for all cache layers to reflect the new price. During that window, a user could see three different prices depending on which page they were on.
The team added cache invalidation logic. This introduced a new class of bugs: invalidation messages that arrived out of order, that were dropped, or that invalidated the wrong cache region. The invalidation system became more complex than the original pricing system.
Lesson: Multi-layer caching with independent TTLs creates consistency windows that grow with the number of layers. Each layer you add multiplies the number of states the system can be in.
Case 4: The Cache That Was Slower
A service added a cache to reduce database load. The cache hit rate was 15% because the access pattern was highly random with a long tail of unique keys. For the 85% of requests that missed the cache, the system now did two lookups (cache miss, then database) instead of one. Average latency increased.
Caching only improves performance when the access pattern exhibits temporal locality, meaning the same data is requested multiple times within the cache's TTL window. Without locality, a cache adds latency and complexity with no benefit.
Rule of thumb: If your expected cache hit rate is below 50%, the cache is probably not worth it. If it is below 30%, it is almost certainly making things worse.
Case 5: The Memory Pressure
A Java service added an in-process cache using a ConcurrentHashMap. The cache grew without bound because nobody implemented eviction. Over a few hours, the cache consumed most of the heap. GC pauses increased from milliseconds to seconds. The service became unresponsive.
The team added an LRU eviction policy. The cache now evicted aggressively, but the eviction processing itself consumed CPU during high-load periods. The cache was competing with the actual workload for resources.
Lesson: In-process caches share resources with the application. Memory for the cache is memory not available for the workload. CPU for cache management is CPU not available for request processing.
My Caching Decision Framework
Before adding a cache, I ask:
- What is the actual bottleneck? Profile first. The slow thing might not be what you think it is.
- Can the bottleneck be fixed directly? Add an index, optimize a query, reduce payload size. These are permanent fixes. A cache is a workaround.
- Does the access pattern exhibit locality? If the same data is not requested repeatedly, a cache will not help.
- What is the consistency requirement? If users must see up-to-date data, caching introduces risk.
- What is the failure mode? When the cache goes down, can the system handle the full load on the backing store?
- What is the operational cost? A Redis cluster requires monitoring, memory management, failover testing, and capacity planning. This is not free.
When Caching Is the Right Answer
Caching is genuinely valuable when:
- The access pattern has high temporal locality (top-N items, configuration, session data)
- The data changes infrequently relative to how often it is read
- The consistency requirement allows staleness (content that is minutes old is acceptable)
- The backing store cannot be scaled further or the cost of scaling it exceeds the cost of a cache
- The cache failure mode is acceptable (the system degrades gracefully, not catastrophically)
Key Takeaways
- Caching is a design decision, not a default optimization. It requires justification.
- A cache that hides a performance bug delays the fix and increases eventual blast radius.
- Thundering herds are a common failure mode. Use stale-while-revalidate or single-flight refresh patterns.
- Multi-layer caching multiplies consistency states. Each layer increases the window of potential inconsistency.
- Low cache hit rates mean the cache is adding latency, not removing it.
- In-process caches compete with the application for memory and CPU.
- Always ask: can the underlying bottleneck be fixed directly?
Further Reading
- Testing Client-Side Caching Strategies: Measuring the impact of HTTP cache headers, service workers, and local storage caching on repeat visit performance and data freshness.
- How I Decide Where Complexity Belongs: A framework for placing complexity in the right layer of a system, whether that is the client, the server, the database, or the infrastru...
- Measuring the Cost of Abstractions: Benchmarking the runtime overhead of ORMs, validation libraries, middleware chains, and framework abstractions with concrete performance ...
Final Thoughts
The best cache is the one you did not need to add. A well-indexed database query that runs in 5ms does not need a cache. A well-designed API that returns only the fields the client needs does not need a cache. Caching is a powerful tool, but it is also a source of complexity, inconsistency, and operational burden. Reach for it after you have exhausted simpler solutions, not before.
Recommended
Designing an Offline-First Sync Engine for Mobile Apps
A deep dive into building a reliable sync engine that keeps mobile apps functional without connectivity, covering conflict resolution, queue management, and real-world trade-offs.
Jetpack Compose Recomposition: A Deep Dive
A detailed look at how Compose recomposition works under the hood, what triggers it, how the slot table tracks state, and how to control it in production apps.
Event Tracking System Design for Android Applications
A systems-level breakdown of designing an event tracking system for Android, covering batching, schema enforcement, local persistence, and delivery guarantees.