What Breaks First When Traffic Scales

Dhruval Dhameliya·December 2, 2025·7 min read

A catalog of components that fail first under increasing traffic, ordered by how commonly they become bottlenecks in web applications.

Context

Over the past three years, I have been involved in scaling four different web applications from hundreds to hundreds of thousands of requests per minute. In every case, the failure order was predictable. The same components break first, in roughly the same sequence, regardless of the tech stack.

Related: Failure Modes I Actively Design For.

Problem

Teams invest in scaling the wrong things. They add application server capacity when the database is the bottleneck. They optimize queries when the connection pool is exhausted. They cache aggressively when the problem is DNS resolution. Understanding the typical failure order saves weeks of misdirected effort.

Constraints

  • This analysis covers typical web applications: a load balancer, application servers, a relational database, and a CDN
  • Traffic pattern: gradual ramp from 100 to 100,000 requests per minute
  • Application type: read-heavy (80/20 read/write ratio)
  • Infrastructure: cloud-hosted (AWS or similar), not on-premise

Design

The failure order I have observed, ranked by frequency:

The Failure Cascade

OrderComponentTypical Breaking PointSymptom
1Database connections50-100 concurrent connectionsConnection timeout errors, 5xx responses
2Database query performanceVaries by query complexitySlow responses, growing queue depth
3Application memoryDepends on instance sizeOOM kills, pod restarts
4External API rate limitsVendor-specific429 responses, degraded features
5DNS resolution1,000+ req/s per resolverIntermittent connection failures
6Load balancer configurationVaries by providerUneven distribution, hot instances
7Disk I/O (logging, temp files)High write volumeIncreased latency, disk full errors
8TLS handshake overhead5,000+ new connections/sIncreased TTFB, CPU saturation

1. Database Connection Pool Exhaustion

This is almost always the first failure. A typical Postgres instance allows 100 default connections. A Node.js application server with a pool size of 10, running 12 instances, needs 120 connections. Add connection overhead from migrations, monitoring, and admin tools, and you are over the limit before traffic even spikes.

Detection: pg_stat_activity shows connections in "idle" or "idle in transaction" state near max_connections.

Fix hierarchy:

  1. Add a connection pooler (PgBouncer) in transaction mode
  2. Reduce per-instance pool size
  3. Increase max_connections (last resort, increases memory per connection)

2. Slow Queries Under Load

Queries that run in 5ms at low traffic take 500ms under load. This happens because of lock contention, buffer cache misses, and increased I/O wait. The usual suspects:

  • Missing indexes on WHERE and JOIN columns
  • Sequential scans on tables over 100K rows
  • N+1 query patterns amplified by concurrent requests
  • SELECT * fetching columns that are never used

Detection: pg_stat_statements sorted by total_exec_time or mean_exec_time.

Fix hierarchy:

  1. Add missing indexes (covers 60% of cases)
  2. Rewrite N+1 patterns as JOINs or batch queries
  3. Add read replicas for read-heavy queries
  4. Implement query-level caching

3. Application Memory Pressure

Node.js defaults to a 1.5GB heap limit. Java applications have JVM heap configuration. In both cases, memory pressure manifests as increased GC pauses, then OOM kills.

Common causes:

  • Unbounded in-memory caches
  • Large JSON payloads held in memory during processing
  • Memory leaks from event listener accumulation
  • Logging libraries buffering in memory

See also: Event Tracking System Design for Android Applications.

Detection: container memory metrics approaching limit, GC pause duration increasing.

Fix hierarchy:

  1. Profile memory allocation (Node.js: --inspect with Chrome DevTools)
  2. Set explicit limits on in-memory caches
  3. Stream large payloads instead of buffering
  4. Increase instance memory (temporary, not a fix)

4. External API Rate Limits

Third-party APIs (payment processors, email services, geocoding) impose rate limits. At low traffic, you never hit them. At scale, a payment flow processing 100 orders/minute can exhaust a 60 req/min API limit.

Detection: 429 HTTP responses from external services, feature-specific error spikes.

Fix hierarchy:

  1. Implement client-side rate limiting with a token bucket
  2. Add request queuing with backpressure
  3. Negotiate higher limits with the vendor
  4. Cache API responses where possible

5. DNS Resolution Bottlenecks

Every outbound HTTP request requires DNS resolution. At high request rates, the default system resolver becomes a bottleneck. This manifests as intermittent 1-3 second delays on random requests.

Detection: inconsistent latency spikes that do not correlate with application or database metrics.

Fix hierarchy:

  1. Enable DNS caching in the HTTP client (Node.js: lookup option with dns.resolve cache)
  2. Use a local DNS cache (dnsmasq, systemd-resolved)
  3. Increase resolver concurrency limits

Trade-offs

MitigationCostComplexityEffectiveness
Connection pooler (PgBouncer)Low ($0-10/month)MediumHigh
Read replicasMedium ($50-200/month)HighHigh
Application caching (Redis)Medium ($20-50/month)MediumMedium-High
CDN for static assetsLow ($5-20/month)LowHigh for static
Vertical scaling (bigger instances)HighLowTemporary

Vertical scaling buys time. Horizontal scaling buys capacity. Architectural changes (caching, read replicas, async processing) buy durability.

Failure Modes

Cascading failures: database slowness causes application request queues to grow, which exhausts memory, which causes OOM kills, which reduces capacity, which increases load on surviving instances. This cascade can take a healthy system to zero in under 60 seconds.

Hidden coupling: an application that calls three external APIs sequentially has a latency floor equal to the sum of the slowest API responses. Under load, each API slows independently, and the combined latency exceeds timeout thresholds.

Retry storms: when a service returns errors, clients retry. If retry logic is aggressive (immediate retry, no backoff), a temporary failure becomes a permanent one because retries double the load on the already-struggling service.

Scaling Considerations

  • Profile under load, not at rest. A system that looks healthy at 100 req/min may be critically bottlenecked at 1,000 req/min.
  • Scale the bottleneck, not the symptom. If the database is slow, adding more application servers makes it worse by increasing connection pressure.
  • Implement circuit breakers for external dependencies. A failing third-party API should degrade gracefully, not bring down the entire application.
  • Load test with realistic traffic patterns. Synthetic benchmarks with uniform requests miss the hot-path bottlenecks that real traffic exposes.

Observability

The minimum monitoring stack for scaling:

  • Database: connection count, query duration (p50/p95/p99), lock wait time, replication lag
  • Application: request duration, error rate, memory usage, GC pause time, event loop lag (Node.js)
  • Infrastructure: CPU utilization, network I/O, disk I/O, DNS resolution time
  • External: response time and error rate per third-party dependency

Alert on the leading indicators (connection count approaching limit, memory usage above 80%) rather than lagging indicators (error rate above 5%).

Key Takeaways

  • Database connections break first in almost every scaling scenario. Add PgBouncer before you need it.
  • Slow queries hide at low traffic and dominate at high traffic. Use pg_stat_statements proactively.
  • Memory pressure causes cascading failures. Set explicit limits on everything that allocates memory.
  • External API rate limits are a hard wall. Design for them with client-side rate limiting and queuing.
  • Scale the bottleneck, not the symptom. Adding application servers when the database is the constraint makes things worse.

Further Reading

Final Thoughts

The failure order is predictable: connections, queries, memory, external limits, DNS. Knowing this sequence means you can instrument and mitigate proactively instead of reacting to each failure in production. Every scaling effort I have been part of started with PgBouncer and ended with caching. The path between those two steps is where the real engineering happens.

Recommended