Testing Caching Strategies in Real Conditions

Context

I tested three caching strategies on a product catalog API serving 15,000 requests/minute. The API reads from Postgres and returns product details. The question was not whether to cache, but which caching pattern would deliver the best hit rate without introducing stale data issues.

Problem

Caching strategies are well-documented in theory. In practice, cache hit rates depend on traffic patterns (Zipfian vs uniform), write frequency, TTL configuration, and cache size constraints. I needed measurements against real traffic to choose correctly.

Constraints

Cache: Redis 7, single node, 256MB memory limit
Origin: Postgres 15, product catalog with 50,000 SKUs
Traffic pattern: 80% of requests hit 20% of products (Zipfian distribution confirmed via access logs)
Write frequency: 200 product updates/hour (price changes, inventory updates)
Acceptable staleness: 60 seconds maximum for price data
Cache key space: product ID (50,000 possible keys)
Read/write ratio: 99:1

Design

Strategy 1: Cache-Aside (Lazy Loading)

Application checks cache first. On miss, reads from database and populates cache.

async function getProduct(id: string) {
  const cached = await redis.get(`product:${id}`);
  if (cached) return JSON.parse(cached);
 
  const product = await db.query('SELECT * FROM products WHERE id = $1', [id]);
  await redis.set(`product:${id}`, JSON.stringify(product), 'EX', 60);
  return product;
}

Strategy 2: Write-Through

Every write updates both the database and cache atomically.

async function updateProduct(id: string, data: ProductUpdate) {
  await db.query('UPDATE products SET ... WHERE id = $1', [id, ...]);
  const updated = await db.query('SELECT * FROM products WHERE id = $1', [id]);
  await redis.set(`product:${id}`, JSON.stringify(updated), 'EX', 300);
}
 
async function getProduct(id: string) {
  const cached = await redis.get(`product:${id}`);
  if (cached) return JSON.parse(cached);
 
  const product = await db.query('SELECT * FROM products WHERE id = $1', [id]);
  await redis.set(`product:${id}`, JSON.stringify(product), 'EX', 300);
  return product;
}

Strategy 3: Read-Through (Cache as primary read path)

A caching proxy sits between the application and database. The application only talks to the cache.

class ReadThroughCache {
  async get(id: string): Promise<Product> {
    const cached = await redis.get(`product:${id}`);
    if (cached) return JSON.parse(cached);
 
    const product = await this.loader(id);
    await redis.set(`product:${id}`, JSON.stringify(product), 'EX', 60);
    return product;
  }
 
  private async loader(id: string) {
    return db.query('SELECT * FROM products WHERE id = $1', [id]);
  }
}

On writes, the cache entry is invalidated (deleted), and the next read triggers a reload.

Trade-offs

Performance Results (7-day production traffic)

Metric	Cache-Aside	Write-Through	Read-Through
Hit rate (overall)	82%	91%	84%
Hit rate (top 20% products)	94%	98%	95%
p50 read latency	2ms (hit) / 18ms (miss)	1.5ms (hit) / 18ms (miss)	2ms (hit) / 18ms (miss)
p95 read latency	5ms / 35ms	4ms / 35ms	5ms / 35ms
Write latency overhead	0ms	+3ms (cache write)	+1ms (cache delete)
Max staleness observed	60s (TTL bound)	0s (write-through)	60s (TTL bound)
Database load reduction	78%	88%	80%

Why Write-Through Won on Hit Rate

Write-through maintains higher hit rates because:

Updated products are immediately available in cache (no miss-then-reload cycle)
Longer TTLs are safe because the cache is always updated on writes
No "thundering herd" on popular products after TTL expiry

The 9% hit rate advantage over cache-aside translates to 1,350 fewer database queries per minute at 15,000 req/min.

Staleness Analysis

Strategy	Staleness Window	Stale Reads During Test
Cache-Aside (60s TTL)	0-60s after write	~3,200 over 7 days
Write-Through (300s TTL)	0s (write updates cache)	0 for written products
Read-Through (60s TTL)	0-60s after invalidation	~2,800 over 7 days

Write-through eliminates staleness for products that are actively updated. The remaining staleness risk is for products updated through a different code path that bypasses the cache update logic.

Memory Usage

Strategy	Peak Memory	Evictions/Hour
Cache-Aside (60s TTL)	45MB	0
Write-Through (300s TTL)	120MB	12
Read-Through (60s TTL)	48MB	0

Write-through with longer TTLs uses more memory because entries live longer. The 256MB limit was not reached, but at 200,000 SKUs with 300s TTL, it would be.

Failure Modes

Cache-aside: thundering herd on TTL expiry. When a popular product's cache entry expires, multiple concurrent requests all miss the cache and all query the database simultaneously. At 15,000 req/min with the top product receiving 5% of traffic, that is 750 req/min. If the TTL expires, 12+ requests may hit the database in the same second. Mitigation: stale-while-revalidate pattern or distributed locks on cache population.

Write-through: cache-database inconsistency on partial failure. If the database write succeeds but the cache write fails, the cache holds stale data. With a 300s TTL, staleness lasts up to 5 minutes. Mitigation: wrap both operations in a try-catch, and on cache write failure, delete the cache entry instead.

Read-through: cache stampede after invalidation. Invalidating a popular product's cache entry causes the same thundering herd problem as cache-aside TTL expiry. The mitigation is the same: probabilistic early expiration or lock-based single-flight cache population.

All strategies: Redis failure. If Redis is unavailable, cache-aside and read-through degrade to direct database reads (acceptable). Write-through may block writes if the cache update is in the critical path. Mitigation: make the cache write fire-and-forget with a timeout.

Scaling Considerations

At 100,000 req/min, the write-through strategy's database load reduction (88%) means 12,000 database queries/min instead of 100,000. This is the difference between needing a read replica and not.
Redis cluster mode supports horizontal scaling, but adds complexity for cache invalidation across shards.
For multi-region deployments, each region needs its own cache. Cross-region cache invalidation adds latency. Consider region-local caches with shorter TTLs instead.
Cache warming on deployment: pre-populate the top 1,000 products on application startup to avoid a cold-cache thundering herd.

Observability

Track cache hit rate per product tier (top 100, top 1000, long tail)
Monitor cache memory usage and eviction rate
Log cache misses that result in database queries exceeding 50ms (these are candidates for cache warming)
Alert on hit rate dropping below 80% (indicates a configuration or traffic pattern change)
Measure end-to-end latency including cache lookup, not just database query time

Key Takeaways

Write-through caching delivered the highest hit rate (91%) and eliminated staleness for actively updated products. The cost is a 3ms write latency overhead.
Cache-aside is the simplest strategy but suffers from thundering herd problems on TTL expiry and lower hit rates.
TTL configuration is the most impactful tuning parameter. Too short reduces hit rate; too long increases staleness.
Cache failure must degrade gracefully. Never let a cache outage cascade into a full system outage.
Measure hit rates by access pattern (popular vs long-tail), not just overall. A 90% overall hit rate can mask a 30% hit rate on the long tail.

Final Thoughts

I deployed write-through caching with a 300-second TTL and a fallback to direct database reads on Redis failure. The 91% hit rate reduced database load by 88%, deferring the need for a read replica by an estimated 6 months at current growth rates. The 3ms write latency overhead was invisible to users. The primary ongoing cost is maintaining cache update logic in every write path, which is a code organization challenge, not a performance one.