What Breaks First When Traffic Scales

Context

Over the past three years, I have been involved in scaling four different web applications from hundreds to hundreds of thousands of requests per minute. In every case, the failure order was predictable. The same components break first, in roughly the same sequence, regardless of the tech stack.

Problem

Teams invest in scaling the wrong things. They add application server capacity when the database is the bottleneck. They optimize queries when the connection pool is exhausted. They cache aggressively when the problem is DNS resolution. Understanding the typical failure order saves weeks of misdirected effort.

Constraints

This analysis covers typical web applications: a load balancer, application servers, a relational database, and a CDN
Traffic pattern: gradual ramp from 100 to 100,000 requests per minute
Application type: read-heavy (80/20 read/write ratio)
Infrastructure: cloud-hosted (AWS or similar), not on-premise

Design

The failure order I have observed, ranked by frequency:

The Failure Cascade

Order	Component	Typical Breaking Point	Symptom
1	Database connections	50-100 concurrent connections	Connection timeout errors, 5xx responses
2	Database query performance	Varies by query complexity	Slow responses, growing queue depth
3	Application memory	Depends on instance size	OOM kills, pod restarts
4	External API rate limits	Vendor-specific	429 responses, degraded features
5	DNS resolution	1,000+ req/s per resolver	Intermittent connection failures
6	Load balancer configuration	Varies by provider	Uneven distribution, hot instances
7	Disk I/O (logging, temp files)	High write volume	Increased latency, disk full errors
8	TLS handshake overhead	5,000+ new connections/s	Increased TTFB, CPU saturation

1. Database Connection Pool Exhaustion

This is almost always the first failure. A typical Postgres instance allows 100 default connections. A Node.js application server with a pool size of 10, running 12 instances, needs 120 connections. Add connection overhead from migrations, monitoring, and admin tools, and you are over the limit before traffic even spikes.

Detection: pg_stat_activity shows connections in "idle" or "idle in transaction" state near max_connections.

Fix hierarchy:

Add a connection pooler (PgBouncer) in transaction mode
Reduce per-instance pool size
Increase max_connections (last resort, increases memory per connection)

2. Slow Queries Under Load

Queries that run in 5ms at low traffic take 500ms under load. This happens because of lock contention, buffer cache misses, and increased I/O wait. The usual suspects:

Missing indexes on WHERE and JOIN columns
Sequential scans on tables over 100K rows
N+1 query patterns amplified by concurrent requests
SELECT * fetching columns that are never used

Detection: pg_stat_statements sorted by total_exec_time or mean_exec_time.

Fix hierarchy:

Add missing indexes (covers 60% of cases)
Rewrite N+1 patterns as JOINs or batch queries
Add read replicas for read-heavy queries
Implement query-level caching

3. Application Memory Pressure

Node.js defaults to a 1.5GB heap limit. Java applications have JVM heap configuration. In both cases, memory pressure manifests as increased GC pauses, then OOM kills.

Common causes:

Unbounded in-memory caches
Large JSON payloads held in memory during processing
Memory leaks from event listener accumulation
Logging libraries buffering in memory

Detection: container memory metrics approaching limit, GC pause duration increasing.

Fix hierarchy:

Profile memory allocation (Node.js: --inspect with Chrome DevTools)
Set explicit limits on in-memory caches
Stream large payloads instead of buffering
Increase instance memory (temporary, not a fix)

4. External API Rate Limits

Third-party APIs (payment processors, email services, geocoding) impose rate limits. At low traffic, you never hit them. At scale, a payment flow processing 100 orders/minute can exhaust a 60 req/min API limit.

Detection: 429 HTTP responses from external services, feature-specific error spikes.

Fix hierarchy:

Implement client-side rate limiting with a token bucket
Add request queuing with backpressure
Negotiate higher limits with the vendor
Cache API responses where possible

5. DNS Resolution Bottlenecks

Every outbound HTTP request requires DNS resolution. At high request rates, the default system resolver becomes a bottleneck. This manifests as intermittent 1-3 second delays on random requests.

Detection: inconsistent latency spikes that do not correlate with application or database metrics.

Fix hierarchy:

Enable DNS caching in the HTTP client (Node.js: lookup option with dns.resolve cache)
Use a local DNS cache (dnsmasq, systemd-resolved)
Increase resolver concurrency limits

Trade-offs

Mitigation	Cost	Complexity	Effectiveness
Connection pooler (PgBouncer)	Low ($0-10/month)	Medium	High
Read replicas	Medium ($50-200/month)	High	High
Application caching (Redis)	Medium ($20-50/month)	Medium	Medium-High
CDN for static assets	Low ($5-20/month)	Low	High for static
Vertical scaling (bigger instances)	High	Low	Temporary

Vertical scaling buys time. Horizontal scaling buys capacity. Architectural changes (caching, read replicas, async processing) buy durability.

Failure Modes

Cascading failures: database slowness causes application request queues to grow, which exhausts memory, which causes OOM kills, which reduces capacity, which increases load on surviving instances. This cascade can take a healthy system to zero in under 60 seconds.

Hidden coupling: an application that calls three external APIs sequentially has a latency floor equal to the sum of the slowest API responses. Under load, each API slows independently, and the combined latency exceeds timeout thresholds.

Retry storms: when a service returns errors, clients retry. If retry logic is aggressive (immediate retry, no backoff), a temporary failure becomes a permanent one because retries double the load on the already-struggling service.

Scaling Considerations

Profile under load, not at rest. A system that looks healthy at 100 req/min may be critically bottlenecked at 1,000 req/min.
Scale the bottleneck, not the symptom. If the database is slow, adding more application servers makes it worse by increasing connection pressure.
Implement circuit breakers for external dependencies. A failing third-party API should degrade gracefully, not bring down the entire application.
Load test with realistic traffic patterns. Synthetic benchmarks with uniform requests miss the hot-path bottlenecks that real traffic exposes.

Observability

The minimum monitoring stack for scaling:

Database: connection count, query duration (p50/p95/p99), lock wait time, replication lag
Application: request duration, error rate, memory usage, GC pause time, event loop lag (Node.js)
Infrastructure: CPU utilization, network I/O, disk I/O, DNS resolution time
External: response time and error rate per third-party dependency

Alert on the leading indicators (connection count approaching limit, memory usage above 80%) rather than lagging indicators (error rate above 5%).

Key Takeaways

Database connections break first in almost every scaling scenario. Add PgBouncer before you need it.
Slow queries hide at low traffic and dominate at high traffic. Use pg_stat_statements proactively.
Memory pressure causes cascading failures. Set explicit limits on everything that allocates memory.
External API rate limits are a hard wall. Design for them with client-side rate limiting and queuing.
Scale the bottleneck, not the symptom. Adding application servers when the database is the constraint makes things worse.

Final Thoughts

The failure order is predictable: connections, queries, memory, external limits, DNS. Knowing this sequence means you can instrument and mitigate proactively instead of reacting to each failure in production. Every scaling effort I have been part of started with PgBouncer and ended with caching. The path between those two steps is where the real engineering happens.

What Breaks First When Traffic Scales

Context

Problem

Constraints

Design

The Failure Cascade

1. Database Connection Pool Exhaustion

2. Slow Queries Under Load

3. Application Memory Pressure

4. External API Rate Limits

5. DNS Resolution Bottlenecks

Trade-offs

Failure Modes

Scaling Considerations

Observability

Key Takeaways

Further Reading

Final Thoughts

Recommended

Understanding ANRs: Detection, Root Causes, and Fixes

How I'd Design a Scalable Notification System

Designing Idempotent APIs for Mobile Clients